Dxwg: Should we split the DXWG GitHub Repo?

Created on 5 Feb 2020  路  13Comments  路  Source: w3c/dxwg

I report below some of the points arisen during recent calls.
Please consider contributing to the discussion with your thoughts.

Among the proposals discussed:

  • One distinct repository for each deliverable, i.e., a repo for UC, DCAT, PROF & ConnegP and perhaps Guidance;
  • Splitting UC but maybe not the rest ;
  • Splitting off non-DCAT work and keep DCAT where it is;

Among the advantages of splitting :

  • It may make it easier to see what's going on compared to the current single, large, repo;
  • It is a good opportunity to make more visible the issues relating to the work of the specific deliverable;
  • Easier management of errata (one errata document for each repo containing a rec, instead of one document for all recs or one errata document for each recs, all in the same repo, but sharing the unassigned errata);

Among the concerns expressed:

  • We don't we want to lose any of the discussions; we need to ensure that all issues and discussions are retained; ( see Philippe's suggestion about how this could be achievable)
  • The repo split is too late in the day and would take too long for little benefit, and high risk;

In case we go for the splitting, Philippe has suggested to clone repository - keep the history of commits - then transfer issues to the clone, there is a concept of transferring issues ( https://鈥宧elp.github.com/鈥宔n/鈥実ithub/鈥宮anaging-your-work-on-github/鈥宼ransferring-an-issue-to-another-repository)

(For a better understanding of how and where the points above have emerged you can have a look at the following minutes of recent calls.
https://www.w3.org/2020/01/14-dxwg-minutes
https://www.w3.org/2020/01/21-dxwg-minutes#x03
https://www.w3.org/2020/01/21-dxwg-minutes#x08
https://www.w3.org/2020/02/04-dxwg-minutes#x12
https://www.w3.org/2020/02/04-dxwg-minutes#x14
https://www.w3.org/2020/01/28-dxwg-minutes )

Most helpful comment

I strongly support splitting into separate product-aligned repositories.

All 13 comments

I strongly support splitting into separate product-aligned repositories.

I'm still a bit concerned about splitting / cloning the repo, for the cons outlined by @riccardoAlbertoni , although I recognise that having separate repos would help reduce the "noise".

A clarification: in case we clone the repo, what will happen with the references we have across issues? The link will point to the issue in the original repo or in the new one?

I admit it might be handy to have separate repositories. However, I share some of the concerns related to the splitting of the repository. I do not know if the advantages balance the efforts, as I do not understand how demanding and problematic the repository splitting might turn out.

Perhaps, the only way to mitigate the concerns is to give a try. We could think of making a test in the small. We can start cloning the repo and transforming it into a repository only for one of our documents.

Is there a member who can volunteer on this "experiment in the small"? Which of the documents do we want to try on?

I'm still a bit concerned about splitting / cloning the repo, for the cons outlined by @riccardoAlbertoni , although I recognise that having separate repos would help reduce the "noise".

A clarification: in case we clone the repo, what will happen with the references we have across issues? The link will point to the issue in the original repo or in the new one?

I support splitting the repositories - on the understanding that the original repos exist. We can reduce noise by transferring genuine issues (i.e. with a specific actionable decision that can be addressed by a PR, distinguished from useful but rambling or exploratory or explanatory threads.) We should include references to old threads that discuss related matters. Issues that cross deliverable boundaries can also be cross referenced - and if they need to be on the "issue log" for a deliverable then we can always create an issue to track resolution of an issue on another deliverable.

I'd like to share some thoughts about repository splitting and DCAT.

I think it would be useful to maintain the GitHub issues, wiki pages, etc related to DCAT in the same repo. In particular, I'd like to avoid that DCAT adopters, who want to check past and ongoing discussions, are forced to jump among different Github repositories.

Also, the current DCAT REC explicitly refers to many of the GitHub issues in the current DXWG Repo. If we want to be sure not messing up the links, we might want to leave everything which is DCAT-related where it is now.

This clearly does not impede splitting/adding new repositories for the other group documents which might have completely different desiderata.

For profiles, I'm dreading that splitting repos will not save us the pain of examining old (open) issues to port them in the new ones, just as if we would have to resolve them in the old one.

So in the end we may just not save time with the split repo route, perhaps even spend more time, if we count all the discussion in the setting and populating the repo, as shown in the very discussion here. Plus we could lose the cross-group synergy that could be quite useful (especially for UCs).

That said reducing the noise could be good, even though the github hum has actually gone done quite a bit recently anyway.

So I guess I'm rather against the split in the end, but I really can live with it if it's the majority thinks it's more appropriate for their work.

Just including the mail Karen's sent (see [1]) in this thread, as it seems very relevant for this discussion.

I had a look at the issue of issues and splitting github. First, there
is the count of open issues:

DCAT: 88
Conneg: 14 (2=won't fix, 4=due for closing)
PROF: 52 (5=due for closing)
Guidance: 92

Then I looked at which projects have issues that overlap with other
projects. This is based on the assignment of labels in github. I did not
find an easy way to check the links between issues, which is another
question.

PROF + Guidance: 9 (3=due for closing)
DCAT + Guidance: 4 (1=due for closing)

Conneg had zero overlaps with any other project, and there was no
overlap between DCAT and PROF.

My conclusions from are:

  1. If we are thinking to leave one of the projects in place where it is,
    my vote would be on DCAT, which has the most open issues and probably
    the most visibility (we've gotten the most comments from outside the
    working group)
  1. Conneg is small and may be completed soon. It could be moved out
    probably without affecting any other projects. But if it does goes to
    REC soon, and further versions are not anticipated, it might just best
    stay where it is and save the effort of separating it.
  1. PROF has some overlap with Guidance, although it is only 6 items.
    Also, there is no work currently on Guidance, so if PROF is separated
    out it would make sense first to look at those issues and see if they
    are needed for PROF.
  1. Guidance: I'm not sure what to say about this since it isn't being
    worked on actively. It could be separated out now, or it could be
    separated only if it becomes active again. It has a lot of open issues
    so it would be some effort to separate it from the DXWG github space.

kc

[1] https://lists.w3.org/Archives/Public/public-dxwg-wg/2020Feb/0040.html

I dont know why the mail archives havent included this reply to that question which clarifies actual state of issues for PROF:

Note these issue tag count numbers are not necessarily reflective of the work involved in splitting...

PROF has some 13 issues in the document [1] (issues that seem to relate to something unresolved or unactioned within the scope of the specification. Other issues would remain as historical discussions in the original repo unless re-opened by consensus of a reactivated PROF sub group. ) Editorial extraction of the specific issues and the current state of the discussion is certainly something I'd be happy to undertake if its too hard to simply copy these issues to the new repo - i'm guessing that new issue numbers would need to be introduced to a new ED.

New issues that relate to potential changes to the specification itself can always be proposed and added to the new repo - they should reference some specific clause of the specification however, and not be about the general nature of things... IMHO we need an improved process to make sure issues are actionable, and also to stop them becoming multi-threaded. Suggestions for an improved procedure here welcome. (i admit i've been guilty of responding in detail to off-topic discussions in issues and hence perpetuating past bad practice - but if we had a better set of guidelines to follow it would be easy to exercise editorial control over issues).

I have no objection to leaving DCAT in place - that would be a decision for the DCAT group - but i would recommend they do a bulk close of all non-DCAT issues at the very least. I have no interest in trying to be on top of two sets of issues per deliverable,

A profiles guidance Note (it would never be a REC in nature) could be restarted at any time - and the repo split at that point - but it may be cleaner to get the split done once if the existing repo is to have any ongoing role beyond historical issue discussions.

Rob Atkinson

[1] https://www.w3.org/TR/dx-prof/#issue-summary

We now have:

They have the same admin settings, labels, branch protection, rights access, etc than w3c/dxwg.

Open issues have been transferred except for those that we were applicable to more than one repo.

I did however transfer 2 issues from profile-guidance to dx-prof:
https://github.com/w3c/dx-prof/issues/7
https://github.com/w3c/dx-prof/issues/3

If those need to be moved back to the dxwg repo, please let me know.

Note that transferring doesn't carry the labels unfortunately so I need to create those after the transfer (I have a separate snapshot of all w3c/dxwg issues prior to the move to help me ensure that).

For closed issues, I'll transfer them over time.

An other side effect: closed issues don't report properly their closing dates when doing a search. Compare
https://github.com/w3c/dx-connegp/issues?q=is%3Aissue+is%3Aclosed
and
https://github.com/w3c/dx-connegp/issues/14

Sounds like a GH bug to me...

looks like one to me, too. ;-)

As the repo has been eventually split. I think we can close this issue.

Any objections?

Closing.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

andrea-perego picture andrea-perego  路  6Comments

dret picture dret  路  4Comments

bertvannuffelen picture bertvannuffelen  路  4Comments

nicholascar picture nicholascar  路  5Comments

jpullmann picture jpullmann  路  7Comments