This was found through the DocumentFormat.OpenXML library which uses System.IO.Packaging extensively (original issue: https://github.com/OfficeDev/Open-XML-SDK/issues/244). The issue logged there is trying to generate a large Excel document which uses a working set of around 10mb on .NET 4.7, while it grows quite quickly until hitting a OutOfMemoryException. I've simplified the issue to remove the dependency on DocumentFormat.OpenXML and it appears to be isolated to writing to a Part within a Package.
using System;
using System.IO;
using System.IO.Packaging;
namespace MemoryRepro
{
class Program
{
static void Main(string[] args)
{
using (var fs = new FileStream(Path.GetTempFileName(), FileMode.Create, FileAccess.ReadWrite))
using (var package = Package.Open(fs, FileMode.Create))
{
var part = package.CreatePart(new Uri("/part", UriKind.Relative), "something/sometype");
using (var stream = part.GetStream())
using (var writer = new StreamWriter(stream))
{
for (var i = 0; i < int.MaxValue; i++)
{
writer.Write("hello");
}
}
}
}
}
}
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net47</TargetFramework>
<!--<TargetFramework>netcoreapp2.0</TargetFramework>-->
</PropertyGroup>
<ItemGroup>
<PackageReference Include="System.IO.Packaging" Version="4.4.0" />
</ItemGroup>
</Project>
This repro code appears to have a working set of around 60mb running on .NET 4.7, while it grows very quickly on .NET Core 2.0
The error on .NET Core 2.0 is:
Unhandled Exception: System.IO.IOException: Stream was too long.
at System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count)
at System.IO.Compression.WrappedStream.Write(Byte[] buffer, Int32 offset, Int32 count)
at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
at System.IO.StreamWriter.Write(String value)
at MemoryRepro.Program.Main(String[] args) in c:\users\tasou\source\repos\MemoryRepro\MemoryRepro\Program.cs:line 21
I did some more investigating, and it's only seen on .NET Core because the .NET 4.6 build just contains type forwarders to the System.IO.Packaging classes in WindowsBase.
It appears that the reason the memory is growing is due to the ZipArchiveMode passed into the ZipArchive. The following is what the packaging library is doing (simplified to show the cause):
static void Main(string[] args)
{
// Causes memory leak
var mode = ZipArchiveMode.Update;
// Does not leak
//var mode = ZipArchiveMode.Create;
using (var fs = new FileStream(Path.GetTempFileName(), FileMode.Create, FileAccess.ReadWrite))
using (var archive = new ZipArchive(fs, mode, true, System.Text.Encoding.UTF8))
{
var entry = archive.CreateEntry("entry", CompressionLevel.NoCompression);
using (var stream = entry.Open())
using (var writer = new StreamWriter(stream))
{
for (var i = 0; i < int.MaxValue; i++)
{
writer.Write("hello");
}
}
}
}
The Package creates the ZipArchive here: https://source.dot.net/#System.IO.Packaging/System/IO/Packaging/ZipPackage.cs,349.
Looks like there are a few other open issues with performance issues for ZipArchive. @ianhays Any thoughts?
The following is what the packaging library is doing (simplified to show the cause)
@twsouthwick, it's doing the exact same thing on both .NET Framework and .NET Core?
The distinction between .NET Framework and .NET Core was a red herring; the .NET Framework build was using a System.IO.Packaging that redirects to the implementation in WindowsBase. When I ran the code in corefx manually against .NET Framework, it repros the same. So, yes, it is doing the same thing on both .NET Framework and .NET Core.
So, yes, it is doing the same thing on both .NET Framework and .NET Core.
I'm a little confused :) What is doing the same thing? Is ZipArchive behaving the same between .NET Framework and .NET Core, but System.IO.Packaging is behaving differently between the two? Or is the issue you cited with Packaging also reproing with .NET Framework?
Sorry about that.
ZipArchive snippet is used in the .NET Core implementation of System.IO.Packaging; I don't know what is used in the .NET Framework implementation since it's in WindowsBase.Thanks, @twsouthwick. I took a quick look. It looks like Package in WindowsBase uses its own internal ZipArchive that has streaming support.
That would explain it. Are there plans to add streaming support to ZipArchive in CoreFX?
There's an issue open for it that's had some good discussion but hasn't moved forward due to it being a lower priority addition.
This has caused different behavior when using the OpenXml SDK library between NetFX and Core. It makes it impossible to create huge spreadsheets without busting memory limits. The implications are rather big as the library can no longer be used if the dataset is large enough in .NET Core.
@ianhays Is there a timeline as to when a design may be available? What can be done to help move that forward? This issue is blocking for people who generate large office documents on the fly and it would be nice to know what steps need to be taken.
There is not a timeline currently. More "likes" on the issue always helps as does discussion of the API on the issue and comments explaining use cases.
It's pretty high up on the priority list as far as compression investments go.
IGNOREME: I am commenting here so I can easily find this issue again in the future, since I can't do that by subscribing alone.
Voting for this to be fixed.
I have an Asp.net core application but my target framework is .net 4.6.2 . based on what I understand, I should not face this issue.
@srini1978 The System.IO.Packaging on .NET Framework and .NET Core use different implementations of ZipArchive. Running on .NET Framework will not hit this issue.
I am also voting for this problem to be fixed.
catch this bug too! In .net core the DocX library for working with .doc files does not work correctly
The issue was opened half a year ago and there are no changes =(
@ianhays Is this (and the zip streaming feature) being tracked for .NET Core 2.2?
@twsouthwick They are not being tracked for .NET Core 2.2 or 3.0 to my knowledge.
cc: @terrajobst
Hello. Almost a year has passed but the bug seems not going to be fixed. This error blocks using of .NET Core in our reporting solution.
+1 -- this is blocking my team from creating a scalable Excel file writing solution in .NET Core using Open-XML-SDK.
I took a closer look at this. It's very much related to the ZipArchiveEntry behavior mentioned here; https://github.com/dotnet/corefx/issues/11669#issuecomment-468016815
Also mentioned by @twsouthwick above.
When opening a Package with ReadWrite access the underlying archive is opened in Update mode which causes all entries to be buffered completely to a MemoryStream. The MemoryStream has an upper limit of int.MaxValue, so in addition to causing this to use a lot more memory than it needs, it also means that the upper limit of entries it can deal with is int.MaxValue.
When opening a Package with FileAccess.Read or FileAccess.Write you won't hit the case where ZipArchiveEntry stores uncompressed data in memory. This can permit you to work with Packages with large files: only open them for FileAccess.Read or FileAccess.Write.
Today there is a bug in the .NETCore implementation of Package which blocks the use of FileAccess.Write. I have a fix for that which I'll submit shortly which should unblock the issue pointed out in the original posting here.
Although S.IO.Packaging on desktop does define "streaming" support, that's not actually at play here as far as I can tell. One thing that is at play is that on desktop the System.IO.Packaging implementation of zip had a fancier stream for the update case. It behaved in a similar way to ZipArchiveEntry where updates would decompress the entire entry for access, but it would back that decompressed stream in a mix of memory+file. https://referencesource.microsoft.com/#WindowsBase/Base/MS/Internal/IO/Packaging/CompressStream.cs,716
There's still the scenario of a package opened for Update that needs to Read / Write large files. For that, let's let issue dotnet/runtime#1544 track the improvement to ZipArchiveEntry to permit it and dotnet/corefx#31362 track using that in System.IO.Packaging.
Greetings,
apparently this issue is still present. Creating a streamed XLSX file using the OpenXmlWriter of Office Open XML SDK 2.10.0, using System.IO.Packaging 4.7.0, on .net core 2.1, memory consumption is not constant but proportional to the amount of data written. This does not happen on .net framework 4.6.2, which exhibits constant, low memory usage as expected.
Any hint is appreciated.
Thanks
@salvois Looks like the issue described here is slightly different. The issue you'll want to track is https://github.com/dotnet/runtime/issues/1544
Most helpful comment
Hello. Almost a year has passed but the bug seems not going to be fixed. This error blocks using of .NET Core in our reporting solution.