All ZipFile APIs are currently synchronous. This means manipulations to zip files will always block a thread. We should investigate using an async file and calling async IO APIs (ReadAsync/WriteAsync) to free up the thread while IO is happening. https://github.com/dotnet/corefx/pull/5680
This needs someone to write up a formal API proposal then we can review and if approved, offer up for implementation. Anyone?
Docs on the API review process and see Jon's API request for a great example of a strong proposal. The more concrete the proposal is (e.g. has examples of usage, real-world scenarios, fleshed out API), the more discussion it will start and the better the chances of us being able to push the addition through the review process.
Forked the repo, and working on an api proposal.
Seems like most of the work outlined in this item is already done. e.g
https://github.com/dotnet/corefx/blob/master/src/System.IO.FileSystem/src/System/IO/File.cs#L682
https://github.com/dotnet/corefx/blob/master/src/System.IO.FileSystem/src/System/IO/File.cs#L732
That probably means this item is done, right?
@danmosemsft
Not quite, @abdulbeard. This particular issue is around adding ZipFile async APIs.
For example, currently we have ZipFile.ExtractToDirectory(string, string)
. This issue is around adding async versions of those API e.g. ZipFile.ExtractToDirectoryAsync(string, string)
.
Gotcha @ianhays . I did a terrible job reading the title 馃槢
Curently, System.IO.Compression.Zipfile doesn't have async methods for CreateFromDirectory
, ExtractToDirectory
, Open
and OpenRead
. This means that these zipfile operations are thread-blocking, and not asynchronous, and hence, comparatively less performant.
namespace System.IO.Compression {
public partial class Zipfile {
public static Task CreateFromDirectoryAsync(string sourceDirectoryName, string destinationArchiveFileName, CancellationToken cancellationToken = default(CancellationToken));
public static Task CreateFromDirectoryAsync(string sourceDirectoryName, string destinationArchiveFileName, CompressionLevel compressionLevel, bool includeBaseDirectory, CancellationToken cancellationToken = default(CancellationToken));
public static Task CreateFromDirectoryAsync(string sourceDirectoryName, string destinationArchiveFileName, CompressionLevel compressionLevel, bool includeBaseDirectory, Encoding entryNameEncoding, CancellationToken cancellationToken = default(CancellationToken));
public static Task ExtractToDirectoryAsync(string sourceArchiveFileName, string destinationDirectoryName, CancellationToken cancellationToken = default(CancellationToken));
public static Task ExtractToDirectoryAsync(string sourceArchiveFileName, string destinationDirectoryName, Encoding entryNameEncoding, CancellationToken cancellationToken = default(CancellationToken));
public static Task<ZipArchive> OpenAsync(string archiveFileName, ZipArchiveMode mode, CancellationToken cancellationToken = default(CancellationToken));
public static Task<ZipArchive> OpenAsync(string archiveFileName, ZipArchiveMode mode, Encoding entryNameEncoding, CancellationToken cancellationToken = default(CancellationToken));
public static Task<ZipArchive> OpenReadAsync(string archiveFileName, CancellationToken cancellationToken = default(CancellationToken));
}
}
These new functions are asynchronous versions of their synchronous counterparts. This allows the .net runtime to allow the thread to do other work while the asynchronous operation completes (as mandated by the TPL and async/await scheduling).
CancellationToken
, by the user, allowing them more control.Here's a proposal ^. Looking forward to constructive feedback and insight.
Should async
APIs be added to ZipArchive
at the same time? For example, consider the following code:
```c#
using (var archive = await ZipFile.OpenAsync("archive.zip", ZipArchiveMode.Update))
{
archive.CreateEntry("empty.txt");
}
using (var archive = await ZipFile.OpenReadAsync("archive.zip"))
{
foreach (var entry in archive.Entries)
{
Console.WriteLine(entry.FullName);
}
}
```
As far as I can tell from skimming the source code, even though the above code uses the proposed async
APIs as much as it can, it still blocks when accessing Entries
and, more insidiously, when disposing the first archive
.
namespace System.IO.Compression {
public class ZipArchive {
public Task<ReadOnlyCollection<ZipArchiveEntry>> GetEntriesAsync();
public Task<ZipArchiveEntry> GetEntryAsync(string entryName, CancellationToken cancellationToken = default(cancellationToken));
}
}
@abdulbeard How does that help? As far as I can tell, CreateEntry
does not block in the current implementation. On the other hand, accessing Entries
and calling Dispose()
can block, depending on the mode.
@svick you're right. Async versions of CreateEntry don't do much, since their underlying functionality is not blocking.
On the matter of Dispose()
being async, not sure there's a good way to do that.
In the matter of Entries
, I can imagine replacing that with something like:
public Task<ReadOnlyCollection<ZipArchiveEntry>> GetEntriesAsync();
public Task<ZipArchiveEntry> GetEntryAsync(string entryName, CancellationToken cancellationToken = default(cancellationToken));
Amended the proposal above.
@JeremyKuhne could you give feedback on this proposal? Perhaps we can move it forward
@abdulbeard
On the matter of Dispose() being async, not sure there's a good way to do that.
Yeah, there isn't a great approach until https://github.com/dotnet/roslyn/issues/114 is added. But in the meantime, I think there are ways to make it work. The approaches I considered are:
Add FlushAsync()
. This would write the current state of the ZipArchive
to disk, which means that calling Dispose()
right after FlushAsync()
would not block.
Example usage:
c#
using (var archive = await ZipFile.OpenAsync("archive.zip", ZipArchiveMode.Update))
{
archive.CreateEntry("empty.txt");
await archive.FlushAsync();
}
Add DisposeAsync()
. Once https://github.com/dotnet/roslyn/issues/114 is added, you would be able to use it with using
.
Example usage:
c#
var archive = await ZipFile.OpenAsync("archive.zip", ZipArchiveMode.Update)
try
{
archive.CreateEntry("empty.txt");
}
finally
{
await archive.DisposeAsync();
}
Looking at the two examples above, I think adding FlushAsync()
is the better option.
@svick
To me it makes sense to wait on resolution of dotnet/roslyn/#114 before deciding on FlusyAsync
vs DisposeAsync
.
But I can also see value in finishing out this stage with FlushAsync
, and then once we have an async dispose, let that replace FlushAsync
.
But I'd love to hear from @JeremyKuhne etc. as well.
@danmosemsft @JeremyKuhne any suggestions for changes to ^, or shall we move forward?
Anything I can do to help move this along?
crickets..........
@abdulbeard Sorry, the actual owner here moved on to another project and this got lost in the shuffle. 馃槮 Thanks for poking on it.
@stephentoub can you give some direction/guidance on the disposal questions? @ericstj do you think this proposal provides what you were looking for?
Presumably IAsyncEnumerable<ZipArchiveEntry>
would be the right choice for GetEntriesAsync
...
@ahsonkhan, @ViktorHofer can one of you take this and drive it forward?
can you give some direction/guidance on the disposal questions?
We have added IAsyncDisposable
and C# is adding await using
that works with it. If disposing a ZipArchive
does work that could be asynchronous (e.g. flushing a stream), having ZipArchive
implement it is quite reasonable. Note that Stream
itself now implements IAsyncDisposable
as well.
Presumably IAsyncEnumerable
would be the right choice for GetEntriesAsync
It depends. Is ZipArchive lazily reading from a stream, e.g. when you go to read the next entry, might that perform a read on the stream? If yes, then having it return an IAsyncEnumerable could make sense. If no, then it should probably return a synchronous enumerable.
Just curious is there any update on this issue? We have a .NET Core project that creates zip files in Azure blobs, and we are forced to use the sync version of ZipArchive apis because the async ones will cause unit tests to hang.
I have some code that is generating and streaming a zip archive on the fly, which fails if AllowSynchronousIO is false, which is the default from core 3.0.0-preview3 onwards:
https://github.com/aspnet/AspNetCore/issues/7644
System.InvalidOperationException: Synchronous operations are disallowed. Call WriteAsync or set AllowSynchronousIO to true instead.
at Microsoft.AspNetCore.Server.IIS.Core.HttpResponseStream.Write(Byte[] buffer, Int32 offset, Int32 count)
at Microsoft.AspNetCore.Server.IIS.Core.WrappingStream.Write(Byte[] buffer, Int32 offset, Int32 count)
at System.IO.Compression.PositionPreservingWriteOnlyStreamWrapper.Write(Byte[] buffer, Int32 offset, Int32 count)
at System.IO.BinaryWriter.Write(UInt32 value)
at System.IO.Compression.ZipArchiveEntry.WriteCentralDirectoryFileHeader()
at System.IO.Compression.ZipArchive.WriteFile()
at System.IO.Compression.ZipArchive.Dispose(Boolean disposing)
at System.IO.Compression.ZipArchive.Dispose()
Like @kccsf I have simillar issue with the Kestrel server configured to disallow synchronous IO. With the current implementation I can not use ZipArchive directly with the Kestrel's response stream - because of synchronous disposing.
cc @carlossanlop
Just hit this today as well. :\ After writing the entries, Dispose() throws the exception. Worked around by allowing sync I/O at that point for now, but would be nice to have some async API that I could call to remove the need for sync I/O.
Triage:
We considered this. Next step is API design and review.
Any update on this? I just hit this as well when trying to stream directly to the response stream in my v3 Azure function.
whats about thread safety of ZipArchive.CreateEntryFromFile / CreateEntryFromFileAsync? I didn't find any information about whether the current implementation is thread safe. Will the new proposal be thread safe?
Background: I assume, if I could process multiple files in parallel, the CPU and Disk I/O would be fully used.
Most helpful comment
Any update on this? I just hit this as well when trying to stream directly to the response stream in my v3 Azure function.