Operation System: Windows
DocFX Version Used: DocFx Build Tasks - Visual Studio Marketplace Extension
Template used: pdf.default
Additional Information: I am able to create the PDF locally just not in a build pipeline in Azure DevOps.
Steps to Reproduce:
"pdf": {
"content": [
{
"files": [
"**/*.md",
"**/**.yml"
],
"exclude": [
"**/toc.yml",
"**/toc.md"
]
},
{
"files": "pdf/toc.yml"
}
],
"resource": [
{
"files": [
"**/*.png",
"**/*.jpg",
"**/*.gif",
"**/*.pdf",
"**/*.pptx",
"favicon.ico"
],
"exclude": [
"**/obj/**",
"**/includes/**",
"**/_site_pdf/**"
]
}
],
"overwrite": [],
"dest": "_site_pdf"
}
Expected Behavior:
Create PDF file
Actual Behavior:
PDF file is not create with these errors in the Build Pipeline.
[19-05-06 06:59:08.197]Error:[PdfCommand.PDF]Error happen when converting pdf/toc.json to Pdf. Details: System.InvalidOperationException: The file is not a valid PDF document.
at PdfSharp.Pdf.IO.PdfReader.Open(Stream stream, String password, PdfDocumentOpenMode openmode, PdfPasswordProvider passwordProvider)
at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.SaveCore(Stream stream)
at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Save(String outputFileName)
at Microsoft.DocAsCode.HtmlToPdf.ConvertWrapper.<>c__DisplayClass7_0.<ConvertCore>b__1(ManifestItem tocFile)
24 Warning(s)
1 Error(s)
##[error]Unable to run command: d:\a\1\s\docfx_pdf.json
code: 1
[19-05-06 06:58:57.112]Info:[PdfCommand.BuildCore.Build Document.LinkPhaseHandlerWithIncremental.Apply Templates]Applying templates to 1029 model(s)...
[19-05-06 06:58:57.707]Info:[PdfCommand.BuildCore.Build Document]XRef map exported.
[19-05-06 06:58:58.568]Info:[PdfCommand.Postprocess]Manifest file saved to manifest.json.
[19-05-06 06:58:58.661]Info:[PdfCommand]Completed building documents in 7219.7605 milliseconds.
[19-05-06 06:58:58.661]Info:[PdfCommand.PDF]Start generating PDF files...
[19-05-06 06:59:08.009]Error:[PdfCommand.PDF]Error happen when converting pdf/toc.json to Pdf. Details: System.InvalidOperationException: The file is not a valid PDF document.
at PdfSharp.Pdf.IO.PdfReader.Open(Stream stream, String password, PdfDocumentOpenMode openmode, PdfPasswordProvider passwordProvider)
at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.SaveCore(Stream stream)
at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Save(String outputFileName)
at Microsoft.DocAsCode.HtmlToPdf.ConvertWrapper.<>c__DisplayClass7_0.<ConvertCore>b__1(ManifestItem tocFile)
[19-05-06 06:59:08.197]Info:[PdfCommand.PDF]Completed Scope:PdfCommand.PDF in 9530.6019 milliseconds.
[19-05-06 06:59:08.197]Info:[PdfCommand]Completed Scope:PdfCommand in 17068.4582 milliseconds.
[19-05-06 06:59:08.197]Info:Completed in 17073.8428 milliseconds
Build failed.
[19-05-06 06:59:08.197]Error:[PdfCommand.PDF]Error happen when converting pdf/toc.json to Pdf. Details: System.InvalidOperationException: The file is not a valid PDF document.
at PdfSharp.Pdf.IO.PdfReader.Open(Stream stream, String password, PdfDocumentOpenMode openmode, PdfPasswordProvider passwordProvider)
at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.SaveCore(Stream stream)
at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Save(String outputFileName)
at Microsoft.DocAsCode.HtmlToPdf.ConvertWrapper.<>c__DisplayClass7_0.<ConvertCore>b__1(ManifestItem tocFile)
24 Warning(s)
1 Error(s)
stderr: QFont::setPixelSize: Pixel size <= 0 (0)
error: undefined
It looks like an issue in v2.42, when docfx switches the component to generate PDF. Could you try v2.41?
I used Chocolatey task to install DocFx v2.41.0 prior to running the DocFx Build Tasks. It looks like v2.41.0 of DocFx gives warnings that there was an error, which allows the pipeline to finish but the PDF comes out as 0 bytes.
Instead of the previous error, I now receive: 2019-05-08T19:04:34.7537835Z [19-05-08 07:04:34.515]Warning:[PdfCommand.PDF]Error happen when converting pdf/toc.json to Pdf. Details: One or more errors occurred.
Looks like something is wrong but the root cause is still not clear from the error message. Could you provide a repo to help reproduce the issue? Or you can first provide your pdf/toc.yml?
Here is the pdf/toc.yml, it does not have much in it so not sure if it's going to be too helpful (Changed file name as "yml" not supported). Can you provide any detail of what can trigger the "Error happen when converting pdf/toc.json to Pdf"?
Looks nothing wrong within the TOC.
I can only see some error happened when the PDF processing library tries opening the intermediate PDF. You could try add "keepRawFiles": true in config to see is there something wrong in the intermediate files generated.
Thank you for suggesting the "keepRawFiles" attribute as it helps with debugging. I was not able to ascertain what the root issue is yet but here are a few more details.
d:\a\1\s\<-- This is the folder that DevOps downloads the content files from the repo.
d:\a\1\s\docfx_pdf.json<-- The PDF manifest file, same that is listed in the OP.
The process creates _Content-Docs-internal.json_ which only has {} in it and _Content-Docs-internal_pdf.pdf_ which is not a valid PDF at 0 bytes.
When I run it locally it creates _Content-Docs-internal.json_ which contains {"Content-Docs-internal_pdf.pdf":{"docset_name":"Content-Docs-internal","asset_id":"pdf/toc.json","toc_files":["pdf/toc.yml"]}}, and a valid PDF _Content-Docs-internal_pdf.pdf_.
In Azure DevOps the _site_pdf>_raw>_site_pdf>pdf folder the _toc.json_ file is a binary match (same) as the local file in the same location. All of the files required for PDF creation are copied to the _site_pdf>_raw>… folders correctly in DevOps.
Viewing the _site_pdf>_raw>_site_pdf>pdf>toc.html file shows the TOC correctly and navigates to linked pages so the issue seems rooted with the creation of the json and pdf files.
Maybe it is because wkhtmltopdf tool is not read on build agent? I believe it must have some warnings if in this case. Any way, I feel this issue is related to this VSTS extension. You could ask in this extension page for help.
I've tried the same in the Azure build but without VSTS extension
I've put docfx.exe to the build machine and started it using CommandLine task:
steps:
- script: |
echo Write your commands here
echo Use the environment variables input below to pass secret variables to this script
echo %docfxtool%
echo %docfxjson%
%docfxtool% pdf %docfxjson% -o %outpath%
displayName: 'Command Line Script copy'
env:
docfxtool: $(Agent.HomeDirectory)\externals\docfx\docfx.exe
docfxjson: docfx_project\docfx.json
outpath: $(Build.BinariesDirectory)
Error is the same.
Please note that docfx.exe is saved in the agent home directory. However wkhtmltopdf is installed to the program files C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe and added to the system environment path.
However I can login to the build machine and run the same commands using cmd.exe
And pdf is generated.
The only one difference is that build pipeline is started as windows service under the Network service account and I used my ordinary domain account (not elevated) to build.
here are error log wiht --loglevel verbose and \b\_site_pdf\_raw\_site_pdf\pdf\toc.json content
docfx.log.zip
upd: I've put command from the log Executing wkhtmltopdf --javascript-delay 3000 -q --no-outline ... to the CommandLine task and it was executed by the build agent without any error
However I do not understand what file should be created. Seems there is no parameter with output file name
upd2: I've found that standard output is loaded from the wkhtmltopdf
tried to run
wkhtmltopdf --javascript-delay 3000 -q --no-outline --encoding utf-8 --user-style-sheet "defaults/default-css.css" --read-args-from-stdin "D:/work/DirectClientDataSync/docfx_project/_site_pdf/_raw/_site_pdf/pdf/TOC.html" "D:/work/DirectClientDataSync/docfx_project/_site_pdf/_raw/_site_pdf/pdf/../articles/releaseNotes.html" "D:/work/DirectClientDataSync/docfx_project/_site_pdf/_raw/_site_pdf/pdf/../index.html" "D:/work/DirectClientDataSync/docfx_project/_site_pdf/_raw/_site_pdf/pdf/../articles/description.html" "D:/work/DirectClientDataSync/docfx_project/_site_pdf/_raw/_site_pdf/pdf/../articles/service.setup.html" "D:/work/DirectClientDataSync/docfx_project/_site_pdf/_raw/_site_pdf/pdf/../articles/service.config.html" "D:/work/DirectClientDataSync/docfx_project/_site_pdf/_raw/_site_pdf/pdf/../articles/service.https.html" "D:/work/DirectClientDataSync/docfx_project/_site_pdf/_raw/_site_pdf/pdf/../articles/serviceProxySetup.html" "D:/work/DirectClientDataSync/docfx_project/_site_pdf/_raw/_site_pdf/pdf/../articles/cwnet.description.html" "D:/work/DirectClientDataSync/docfx_project/_site_pdf/_raw/_site_pdf/pdf/../articles/cwnet.config.html" "D:/work/DirectClientDataSync/docfx_project/_site_pdf/_raw/_site_pdf/pdf/../articles/cwnet.mapping.html" "D:/work/DirectClientDataSync/docfx_project/_site_pdf/_raw/_site_pdf/pdf/../articles/wsdl.html" "D:/work/DirectClientDataSync/docfx_project/_site_pdf/_raw/_site_pdf/pdf/../articles/xml.samples.html" "D:/work/DirectClientDataSync/docfx_project/_site_pdf/_raw/_site_pdf/pdf/../articles/export.description.html" "D:/work/DirectClientDataSync/docfx_project/_site_pdf/_raw/_site_pdf/pdf/../articles/export.sample.html" "D:/work/DirectClientDataSync/docfx_project/_site_pdf/_raw/_site_pdf/pdf/../articles/export.map.html" "D:/work/DirectClientDataSync/docfx_project/_site_pdf/_raw/_site_pdf/pdf/../api/DirectSync.Interfaces.IDirectSyncServer.html" "D:/work/DirectClientDataSync/docfx_project/_site_pdf/_raw/_site_pdf/pdf/../api/DirectSync.Interfaces.Dto.FormXmlDto.html" "D:/work/DirectClientDataSync/docfx_project/_site_pdf/_raw/_site_pdf/pdf/../api/DirectSync.Interfaces.Image.html" - > test.pdf
locally on dev computer but there is no output and looks like wkhtmltopdf freezes (it can be stopped by the Ctrl+C). I've removed -q parameter but nothing is changing
upd3: I've removed --read-args-from-stdin and it started to work on dev machine and on the build server from the command line and as a part of build pipeline. In all three cases it seems to produce correct pdf as I can see
seems that the problem is in the reading output from the started process.
I've used the dev branch source code and have added logging to the HtmlToPdfConverter.ConvertToStreamCore method as below
using (var standardOutput = process.StandardOutput)
{
standardOutput.BaseStream.CopyTo(stream);
if (standardOutput.BaseStream.CanSeek)
Logger.LogVerbose($"has read from the stdout, size:{standardOutput.BaseStream.Length/1024}KB");
if (stream.CanSeek)
Logger.LogVerbose($"has read from the stdout, total size:{stream.Length/1024}KB");
}
it shows
[19-05-31 10:29:04.126]Verbose:[PDF.wkhtmltopdf]has read from the stdout, total size:0KB
in the build log that is started by hosting agent (windows service as Network service)
This means that stdout can not be read from the wkhtmltopdf tool
I've found workaround however not sure about PR.
The problem can be solved if input arguments are not set using standard input stream but are set in the StartInfo.Arguments. I've set _htmlToPdfOptions.IsReadArgsFromStdin = false; in the HtmlToPdfConverter.ConvertToStreamCore source code, built the Microsoft.DocAsCode.HtmlToPdf.dll and replaced it on the build server.
and there is no error any more :)
````
_htmlToPdfOptions.IsReadArgsFromStdin = false; // hardcoded, I do not know how to set it using docfx parameters
using (var process = new Process
{
StartInfo = new ProcessStartInfo
{
UseShellExecute = false,
RedirectStandardInput = _htmlToPdfOptions.IsReadArgsFromStdin,
RedirectStandardOutput = _htmlToPdfOptions.IsOutputToStdout,
WindowStyle = ProcessWindowStyle.Hidden,
FileName = Constants.PdfCommandName,
Arguments = _htmlToPdfOptions + (_htmlToPdfOptions.IsReadArgsFromStdin ? string.Empty : (" "+arguments)), // space is required
}
})
{
using(new LoggerPhaseScope(Constants.PdfCommandName))
{
Logger.LogVerbose($"Executing {process.StartInfo.FileName} {process.StartInfo.Arguments} ({arguments})");
process.Start();
if (_htmlToPdfOptions.IsReadArgsFromStdin)
{
using (var standardInput = process.StandardInput)
{
standardInput.AutoFlush = true;
standardInput.Write(arguments);
}
}
if (_htmlToPdfOptions.IsOutputToStdout)
{
using (var standardOutput = process.StandardOutput)
{
standardOutput.BaseStream.CopyTo(stream);
}
if (stream.CanSeek)
Logger.LogVerbose($"got {process.StartInfo.FileName} output {stream.Length}Bytes");
}
process.WaitForExit(TimeoutInMilliseconds);
}
}
````
however I can not explain this.
here is the source code to reproduce the problem as the gist
it works as console (and service) but fails (no bytes are read) when started using as a part of Azure build pipeline
````
displayName: 'Command Line Script html'
````
fixed in 2.43.2
The workaround (add "noStdin" : true to _docfx.json_) to fix this issue that was added here https://github.com/dotnet/docfx/pull/4719 does not work for me. I now get the following error:
Convert failed, will retry in 00:00:05seconds
Convert the file : "suuuuuuuper long list of html files here" - has exception, the details: The filename or extension is too long
[19-12-20 09:16:14.667]Error:[PDF]Error happen when converting pdf/toc.json to Pdf. Details: iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
at iTextSharp.text.pdf.PdfReader..ctor(IRandomAccessSource byteSource, Boolean partialRead, Byte[] ownerPassword, X509Certificate certificate, ICipherParameters certificateKey, Boolean closeSourceOnConstructorError)
at iTextSharp.text.pdf.PdfReader..ctor(Stream isp)
at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.SaveCore(Stream stream)
at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Save(String outputFileName)
at Microsoft.DocAsCode.HtmlToPdf.ConvertWrapper.<>c__DisplayClass7_0.<ConvertCore>b__1(ManifestItem tocFile)
[19-12-20 09:16:14.667]Info:[PDF]Completed Scope:PDF in 257010.934 milliseconds.
[19-12-20 09:16:14.667]Info:Completed in 303690.9001 milliseconds
Is there a way to reduce the list of arguments passed?
well, seems that you had broken the command line length limit
how long is the _"suuuuuuuper long list of html files here"_ ?
how long is the "suuuuuuuper long list of html files here" ?
505021 characters, 350 html files are passed
Is there a way to reduce the list of arguments passed?