Windowscommunitytoolkit: Might-be-trivial issue to you: the repo size is growing

Created on 15 Feb 2017  路  20Comments  路  Source: windows-toolkit/WindowsCommunityToolkit

It used to take a flash to clone the repo. Now the repo size has grown to a whopping 257MB, with but 21720 git objects in the pack to be downloaded.

Why I use whopping, to compare:
Git itself has 286253 git objects, with a size of 112MB
.NET Corefx has 749047 git objects, with a size of 181MB.

With a checkout on the master, we can see this folder info:

82M     ./docs
464K    ./githubresources
924K    ./Microsoft.Toolkit.Uwp
796K    ./Microsoft.Toolkit.Uwp.Notifications.NETStandard
664K    ./Microsoft.Toolkit.Uwp.Notifications.Portable
1.7M    ./Microsoft.Toolkit.Uwp.Notifications.UWP
2.2M    ./Microsoft.Toolkit.Uwp.Notifications.WinRT
75M     ./Microsoft.Toolkit.Uwp.SampleApp
1.5M    ./Microsoft.Toolkit.Uwp.Services
852K    ./Microsoft.Toolkit.Uwp.UI
880K    ./Microsoft.Toolkit.Uwp.UI.Animations
2.9M    ./Microsoft.Toolkit.Uwp.UI.Controls
2.0M    ./Notifications
43M     ./UnitTests
976K    ./UnitTests.Notifications.Portable
16M     ./UnitTests.Notifications.UWP

Going further down we can discover a lot culprits being images, e.g. under UWPCommunityToolkit/docs/resources/images we have a list:

82K adaptive.GIF
141K AddNugetServices.png
266K Animations-Blur.gif
34K Animations-Fade.gif
967K Animations-FadeHeader.gif
132K Animations-Light.gif
93K Animations-Offset.gif
285K Animations-Rotate.gif
177K Animations-Scale.gif
163K choosetoolboxitems.png
5.5M Controls-AdaptiveGridView.gif
625K Controls-BladeView.gif
117K Controls-DropShadowPanel.png
502K Controls-Expander.gif
12K Controls-GridSplitter.png
1.4M Controls-HamburgerMenu.gif
2.8K Controls-HeaderedTextBlock.png
1.9M Controls-ImageEx.gif
6.7M Controls-MarkdownTextBlock.gif
3.8M Controls-MasterDetailsView.gif
125K Controls-PullToRefreshListView.gif
289K Controls-RadialGauge.gif
24K Controls-RangeSelector.gif
1.7M Controls-RotatorTile.gif
230K Controls-ScrollHeader.gif
231K Controls-SlidableListItem.gif
5.8K Controls-TextBoxMask.png
5.8K Controls-TextBoxRegex.png
9.2K Controls-WrapPanel.png
2.3M hamburgermenu.gif
8.5K head.GIF
496K herotile.png
132K imageex.GIF
8.0M LoadingXamlControl.gif
36K ManageNugetPackages.png
183K Notifications-LiveTile.gif
174K Notifications-PopToast.gif
18K Notifications-WeatherLiveTileAndToast.png
51K NugetPackages.png
27M ParallaxService.gif
23K radial.GIF
6.1K range.GIF
18M ReorderGrid.gif
101K sampleapp.png
45K sampleapp-small.png
104K slideable.GIF
63K SurfaceDialTextboxAnim.gif
16K SurfaeDial.jpg
118K TileControl.gif
4.5K toolboxfinal.png
43K Toolkit_Responsive_Behavior_v01_img-MD-SM.png
49K Toolkit_Responsive_Behavior_v01_img-XL.png
25K Toolkit_Responsive_Behavior_v01_img-XS.png
12K weatherlivetilentoastNotification.GIF
60K WinSDKFBInstall.png

Seriously guys, one single gif to be 27M?

No intention to offend anyone. But git was designed to version on text data, not binary image. And once you commit and push, it will be in the history almost forever, it is very hard to try to remove an object from history (although possible). In our case, we can't even escape from git clone --depth=1 trick that you can do if you dare to clone Linux kernel (object number approaching 6 million, download size around 2GB), since one recent commit will still drag in all these images.
(edit: now I tried clone our toolkit with depth 1, we are downloading 1276 objects, with a download size of 105MB)

Having a huge repo is bad, think about how fast CI/CD can run, jenkins/appveyor can only start to build after clone. And the wasted Internet bandwidth plus kittens.

Sorry about my ultra-sensitivity on this, I was testing my store App built-in git clone functions using the toolkit repo as a target, and found out it's extremely slow...

help wanted improvements open discussion

All 20 comments

I agree. The repository should be huge. Until everyone can use GVFS, I think we can reduce the size of images/gif.

@xied75 What about the SampleApp folder? There is an abnormal size too.

I agree on this. Also I wasn't able to compile the solution for some reason when I cloned the repo from the scratch. I don't remember the reason but there are some things that prevent solution from building. Not sure if they're still there though. I need to check again.

@Odonno you mean it should be huge? :) Regarding the SampleApp, UWPCommunityToolkit/Microsoft.Toolkit.Uwp.SampleApp/Assets/Photos

Interesting feedback. what do you recommend? if we push an update to reduce gif size the original will still be in history

@deltakosh that solve --depth=1 and speedup CI/CD. Regarding big thing in history, ............. I can try to find a solution if we prefer to solve this.

I would like both:)
As a first step I agree we can at least reduce image size, if someone wants to volunteer;)

Probably also set a RULE regarding image size also.

Do you want to try reducing picture size?

I'm guessing items like the 27 meg gif will need to be re-recorded, not just compressed

@skendrot I think gifs need resizing.. I will push it on a separate branch

@deltakosh I think instead of certain animated gifs we might be better off with screen capture video as wmv

both parallax and animate grid videos are way too large

Hey there! What's the current status of this issue?

@azchohfi @HerrickSpencer any thoughts on how we optimize our git history? Pulling down the repo is like 436MB now! However, 419MB of that is the .git folder...

Without loosing history, by push forcing the removal of big files, or all commits to master, I don't think there is a way. It would also be a pain to anyone that already cloned the repo, or forked it, or that have a current PR in place.

@azchohfi would a good time to do something like that be maybe for when we swap over to WinUI 3 as we'd be modifying so many files anyway?

We could test it and see how much it would save (probably a lot).

@azchohfi @HerrickSpencer any thoughts on how we optimize our git history? Pulling down the repo is like 436MB now! However, 419MB of that is the .git folder...

Interesting. I will investigate this option. I'm hopeful there are some steps we can take. I'll suggest a few for us to discuss.

Relaying internal conversation:
I've done some investigation on options to reduce our repo size.

As far as I see it we don't have too many good ones that don't involve making a divergent master branch.

The main folder that is causing size bulk is .git, (445mb) likely to the large history we have
All folders under that are not so huge. Sample app is the next largest with <7mb

Atlassian gives some good options, some involve making a branch that shares a base with first commit, and squashes all commits after that. I did this locally, and cloned from that local branch and got the same size .git folder

So I'm not sure this would work unless we actually orphan the master branch, and relegate a copy of the history to an entirely different repo.

Since all options I have discovered still cause a divergent branch, all current consumers of the repo would need to force pull from the repo to get the smaller size. This could cause issues for people, I suggest a doc/notification to explain how to do this.

One other interesting option is setting up a Sparse-Checkout option in our current repo, to exclude folders that aren't expressly needed to work as a developer. I suspect that there won't be too many of these, and of these folders they won't cause much of a size decrease.

Conclusion: if a 445MB history is causing issues in the community, we can setup an alternate repo that shares a base commit. This way we can pull changes between the repos. This will add a lot of complexity, but would likely work.

Suggestion is to do a 'hard history reset' every so often (#of commits, or yrs) that will push all changes older than a recent release into a secondary history repo, and reset the history on the current repo to a shared commit (the release commit).

I will also look into the history some deletions of very large files (images?) ... there is an option to GC these out of the history as well. I can look into this option.

Per conversation with @michael-hawker, the realistic option is that we do the hard reset option when we make a huge refactor of code base, such as a move to WinUI 3, then start with a new history of 7.0 + the merge of the WinUI3 branch. The rest of the history can then be archived to its own repo, and linked with a base commit.

We'll consider this at next large release.

I vote for leaving the current as is. And starting a brand new repo.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

deltakosh picture deltakosh  路  29Comments

deltakosh picture deltakosh  路  55Comments

WilliamABradley picture WilliamABradley  路  33Comments

ThomasPe picture ThomasPe  路  28Comments

hermitdave picture hermitdave  路  78Comments