Kibana: [Discuss] Periodic repo health review

Created on 11 Feb 2020  Ā·  12Comments  Ā·  Source: elastic/kibana

Our recommended best practice for folder structure is to organize by domain, but this can be difficult to coordinate among many teams that don't have a high level view of the entire code base.

I'd like to propose a periodic "folder structure review" by either the platform team or a group of delegates from a few teams to take a look at the top level folders/plugins every now and then, and see if we can make any recommendations or request changes. This will help us avoid getting into the same situation we got into with the legacy ui/public folder.

Some recent situations I ran into:

  • SIEM, endpoint, resolver embeddable, nested under "Security" folder? What about the Security area teams folder?
  • es_ui_shared has some common components used by a few different plugins - rollups, cross_cluster_replication, ingest_management. At least some of those components would be a better fit in kibana_react. Do we need an xpack version?

thoughts?

cc @joshdover @epixa @peterschretlen

discuss

Most helpful comment

We discussed this during the Platform Team's weekly today. Some thoughts that came up:

  • Auditing plugins is very time consuming. There is concern that this won't be a very scalable solution.
  • Elastic area teams are organized to be quite autonomous, and we like it this way. We worry that "mandating" or even "highly encouraging" tech debt cleanup work is going to be an uphill challenge. Augmenting or even prioritizing cleanup work based on external metrics may make this easier (eg. bug reports, support case metrics, test coverage metrics).
  • We prefer proactive and automated solutions when possible. There are some things in the list above that could be solved with better or more linting rules.

Some thoughts on next steps forward:

  • What if we tried splitting this up between things that could be automated or measured (eg. things that could be solved with eslint, test coverage metrics) and things that require manual work (eg. directory structure, tight coupling to Core APIs)

    • We can start prioritizing implementation of the automated work in the Platform and Operations (?) teams.

    • For the manual work, we can start with one topic (I prefer directory structure), do an audit with a group of 2-3 people, and report back results. We can see how this goes and evaluate success based on effort to audit, how easily we can get teams to make changes, and whether or not there is a real ROI on this.

Aside from overall repo health, I think there may be a real need for individual _plugin health_. I suspect there may be more value in this than overall repo health, especially if we can focus on customer-facing impact, like bug reports.

One thing I've seen done in other organizations is a "Top 10 bugs list" that is well publicized within the organization (even outside engineering) and backed up by real metrics. This kind of visibility has a large potential to actually cause change. We could do something similar with just bugs or even with some of the other general health items (I could easily imagine a "Top 10 untested modules features" list).

All 12 comments

@joshdover suggested we make this more of a general "repo health" check that additionally includes things like:

  • Test coverage reports
  • Catching bad practices and pushing for rules to be added to catch them when possible (like no any type, no export * at the top level).
  • And other convention violations

Perhaps this forum could be done async as well with the outcome being a report of any violations found, tagging the team in question and creating tracking/issues.

I created the kibana-convention-council slack channel for these kinds of discussions.

Other "health" things that come to mind which would be helpful to check:

  • Are private APIs leaking into a plugin's public contracts (a common side effect of export *)
  • Are we sharing things that don't necessarily need to be shared (e.g. a 1-line utility that's only used in 2 places)
  • Are there shared pieces of code which have been needlessly placed in a generic location (e.g. kibana_utils or es_ui_shared), when they could logically belong to a specific plugin domain
  • For plugins with shared registries, are we following recommended patterns with those (e.g. avoiding registering new items in start)
  • Are plugins structured to correctly handle async plugin loading, or to handle optional dependencies which might not be present

These are things that are easy for someone to overlook as it isn't possible to lint for all of them.

Periodic checks of the repo sound good, but I’d like to clarify what are the issues we're trying to solve with these checks/reviews. Reading through the GH issue I’m seeing the following:

  • Contributors aren’t necessarily aware of best practices and pitfalls to watch for. They’re not always evident.
  • We’d like to detect problems early before they take root and spread. Automated checks like linting don’t work for everything.
  • We lack a measure of ā€˜health’ we can track and report over time. This might include things like code coverage.
  • ...?

Is that a good characterization? I'd just want us to be clear on the main issues and objectives.

I think also it’s worth asking if there are ways we might address these in our existing processes, before we consider new ones.

Say for example contributors aren’t aware of certain pitfalls, and that’s the main issue. What if there was a short checklist to run through during code review to help detect common pitfalls? That might be a decent alternative to a centralized review.

Yep, great points! I think it's more about the two first bullet points than the last one. A measurement like code coverage would be great, but other than that, I'm not sure how else we could automatically measure health.

Say for example contributors aren’t aware of certain pitfalls, and that’s the main issue. What if there was a short checklist to run through during code review to help detect common pitfalls? That might be a decent alternative to a centralized review.

We have a variety of ways developers can learn about these best practices today:

Some of those should be combined, but some are targeted towards a specific situation.

Agree we should focus on trying to solve the root of the problem, but I think there is an advantage to having dedicated time/process for reviewing the current state of the repo to ensure the conventions are being understood and adhered to. This can expose situations where our documentation falls short and we need to update. A dedicated channel can be a place for people to ask clarifying questions/highlight inconsistencies. We could decide this is part of a single area teams responsibility (or tech leads), but to me, it feels like a mixture of all of those (maybe even ops team too).

One thing that could be useful here is a definition of what types of things do we consider important enough to the entire project to be auditing.

In other words, which things are really worth enforcing / facilitating across the whole project and which aren't?

In my opinion, we should focus on holding the quality line in the areas that can leak out of their domains and create problems for other teams. For example:

  • Directory structure and organization. A disorganized and inconsistent file tree, at least at the first few levels, can quickly get out of hand. Having consistency here is helpful for new and existing developers alike.
  • API style & semantic conventions. (at both the HTTP _and_ JS level) These have cross-cutting effects on both development teams and users of our software APIs. Providing consistency here makes consuming shared APIs and services easier and more robust.
  • Usage of Core & shared APIs. (eg. integration with application service's router, embeddables). Ensuring that API integrations are done correctly can have effects on UX. For example, an incorrect integration with the SPA router in the Kibana Platform can lead to UX bugs from inconsistent routing behaviors.
  • Test coverage of integration points. (eg. any API exposed by a plugin for other plugins should be well tested). Bugs and instability in APIs that are shared across plugins leads to bugs across the product.
  • Tight coupling to Platform APIs. The more tight coupling we have to Core and other shared APIs (eg. data plugin), the harder it is for these APIs to evolve. We will inevitably need to change these APIs in the future and anything we can do to make that switch / migration easier is probably worth doing now before these problems grow further.

Things that I don't think are as important to be consistent:

  • Test coverage. Instability in any part of Kibana can have adverse effects on others, though the Kibana Platform's design does make this a bit less of a problem. I think as an Engineering organization we should have test coverage targets, but I think these tend to be a function of the structure and maturity of individual teams & products and there can be a lot of variance here. I don't really see a problem with that. I do think having good test coverage metrics and reports is useful, but I don't think we need a process for reviewing these at a project-wide level.
  • Usage of TypeScript. Many teams at Elastic have been successful with TypeScript and I generally think it's a great tool that enhances stability _and_ development velocity. However, some teams & products have more tolerance for failure than others. Maybe a tool to show metrics of unsafe type usages would be useful, but again I don't think we should require a specific level of type safeness for all teams.
  • Adoption of specific libraries. In general, the only libraries that all plugins must use is EUI and React. We do not push more opinionated libraries on plugins (eg. Redux) and I think may be a good thing. Allowing plugins to experiment and innovate on their own terms gives teams a lot of agency. We should continue to keep build general tools when we see multiple plugins implementing the same thing in different ways. However, I don't think forcing a solution on all teams is useful or valuable in many situations. UX consistency is perhaps one exception.
  • Code style. (controversial āš ļø) While this can affect some integration points, on the whole, this generally just affects the developers of the code. As a Platform developer, I don't really care if one team uses one style over another, as long as the public APIs conform to what I expect. I think you could argue that consistent style helps developers in one area understand the code in another. I just think our existing linter is good enough for this. If teams want to opt-out in certain places, I don't have any particular problem with this.

Great write up @joshdover - I think that sounds great. I just ran across one more that is good to keep an eye on - things that blow up the size of the repo like giant mock files or data archives, especially for testing purposes.

We discussed this during the Platform Team's weekly today. Some thoughts that came up:

  • Auditing plugins is very time consuming. There is concern that this won't be a very scalable solution.
  • Elastic area teams are organized to be quite autonomous, and we like it this way. We worry that "mandating" or even "highly encouraging" tech debt cleanup work is going to be an uphill challenge. Augmenting or even prioritizing cleanup work based on external metrics may make this easier (eg. bug reports, support case metrics, test coverage metrics).
  • We prefer proactive and automated solutions when possible. There are some things in the list above that could be solved with better or more linting rules.

Some thoughts on next steps forward:

  • What if we tried splitting this up between things that could be automated or measured (eg. things that could be solved with eslint, test coverage metrics) and things that require manual work (eg. directory structure, tight coupling to Core APIs)

    • We can start prioritizing implementation of the automated work in the Platform and Operations (?) teams.

    • For the manual work, we can start with one topic (I prefer directory structure), do an audit with a group of 2-3 people, and report back results. We can see how this goes and evaluate success based on effort to audit, how easily we can get teams to make changes, and whether or not there is a real ROI on this.

Aside from overall repo health, I think there may be a real need for individual _plugin health_. I suspect there may be more value in this than overall repo health, especially if we can focus on customer-facing impact, like bug reports.

One thing I've seen done in other organizations is a "Top 10 bugs list" that is well publicized within the organization (even outside engineering) and backed up by real metrics. This kind of visibility has a large potential to actually cause change. We could do something similar with just bugs or even with some of the other general health items (I could easily imagine a "Top 10 untested modules features" list).

For the manual work, we can start with one topic (I prefer directory structure), do an audit with a group of 2-3 people, and report back results. We can see how this goes and evaluate success based on effort to audit, how easily we can get teams to make changes, and whether or not there is a real ROI on this.

++, and I agree on directory structure as the initial topic.

I volunteer myself. Can we get one volunteer from platform team and one from app arch team?

I'll create an issue in kibana-team repo as a place to summarize notes/findings, and can set up a recurring meeting. How does 30min every other week sound? We can try to do most of the work async, and have the meeting as forcing function to identify areas we'll focus on over the next couple weeks, and for topics that are difficult to solve async.

++ to Josh's breakdown of important vs less-important items, as well as the next steps.

Would we consider the question of domain boundaries (e.g. what to do about things like es_ui_shared) to be part of this discussion, or a different category? If we aren't grouping it with directory structure, this topic would probably be my preference for the next area to address which cannot be automated.

Can we get one volunteer from platform team and one from app arch team?

I'd be happy to help on the app arch side.

Would we consider the question of domain boundaries (e.g. what to do about things like es_ui_shared) to be part of this discussion, or a different category?

I would consider it part of the same discussion. Directory structure / domain organization.

Can we get one volunteer from platform team and one from app arch team?

I can rep the Platform team for the first go around as well.

How does 30min every other week sound?

Every other week to start works, maybe less frequently after we get going?

@tylersmalley

Closing this as we have been having our periodic repo review check ins and storing the summaries and notes elsewhere.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tbragin picture tbragin  Ā·  3Comments

bhavyarm picture bhavyarm  Ā·  3Comments

bradvido picture bradvido  Ā·  3Comments

treussart picture treussart  Ā·  3Comments

ctindel picture ctindel  Ā·  3Comments