The weblog message content should also provide the manifest information on workflow submit, such that it can be used for remote database logging.
When you want to do remote logging of your workflows persistently in a database, and you need to relate the workflow manifest with the trace data.
Fetch the manifest object from the Session object as Map Session.manifest.toMap() in the WebLogObserver class. Add a property manifest in the JSON message, if the manifest data is provided.
@johandahlberg @apeltzer @ewels any suggestions to this? The current PR #1077 only sends the manifest data on workflow submission, which I think is sufficient, as you can store the manifest data with the uuid4 of the run and link further trace messages. What do you think?
Sounds great! Yes I think that’s probably sufficient.
If we’re sending this, why not just send the entire config output on the workflow start?
Sounds reasonable, though I don't know enough about the inner workings of nextflow to know the details of how this could best be achieved. :smile: I agree with @ewels that it might be useful to get the full config submitted at the start of the workflow.
is there's a really need for that? I would prefer to follow a kiss approach.
I think that we will definitely need more information about the workflows - the workflow run time variables at least (eg. the directories being used and user etc). If it's just the manifest then I could imagine us having a nice webpage showing 50 different rnaseq pipelines running with no real information to differentiate between them. It could be nice to be able to show the input data used from params.input or something for example.
Because we expect people to implement different tools to monitor workflows using this, I would advocate just sending everything and then people can choose what to use. Shouldn't be much data still and should be pretty easy to serialise into a JSON object.
I see the rationale but, the full config can contains sensitive informations, eg. cloud security credentials, tokens and password as env vars, etc. therefore it would be required to strip all this information.
hmm, yep, ok. How about just params and workflow? I guess the sensitive stuff will be in executor and env scopes usually?
Hmm, not sure this is feasible but I agree that a restriction makes sense. I assume the same as Phil that an exclusion of the executor and env scopes could already suffice if properly documented? Could also make sure to mention that in the docs for the weblog feature that people shouldn't have params with sensitive info?
I agree, the information in params scope would be indeed beneficial for internal quality assessment procedures. Having it in the weblog message, makes it easy to have information relevant for reusability...
+1 for workflow which is supposed to hold all workflow metadata. Eventually also params however the latter may be incomplete because it would not reflect parameters set in the main script, therefore it could be confusing and it would be better to not include for now.
How much work is required to get the parameters set in the main script? I suspect that this will be a fairly high priority for weblog to be picked up.
Complexity index 9 out if 10. The problem here is the parameters in the
script are just variable assignments.
To evaluate them you need to run the full script.
The new modules syntax will introduce some changes that may allow to parse
them without running the full pipelines.
However it's not anytime soon.
On Tue, Mar 19, 2019, 17:58 Phil Ewels notifications@github.com wrote:
How much work is required to get the parameters set in the main script? I
suspect that this will be a fairly high priority for weblog to be picked up.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/nextflow-io/nextflow/issues/1076#issuecomment-474469605,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAx3SEhdJiriYMpxalJOnNqgC9FIzfHlks5vYRdHgaJpZM4b2i2B
.
@ewels @johandahlberg I have appended the WorkflowMetadata content now to the payload. When do you want this to have included (event types)?
Thanks for clarification @pditommaso !
@sven1103 I think once at the beginning of each workflow should suffice - or does it make more sense to send it upon successful (?) finalization of the job? Then we don't have to filter it afterward as it will only be there when a job finishes successfully... ?
To evaluate them you need to run the full script.
@pditommaso - but we're talking about a sending this weblog event after the workflow has started, so we're already running the full script here anyway?
@sven1103 @apeltzer - I think maybe both start and end? Some would be useful to have at the end, such as workflow.success which obviously only makes sense then. But most things would be most useful when the workflow first starts.
we're talking about a sending this weblog event after the workflow has started
ouch, I was stuck in the other config issue! yes, here params are valid when the execution starts
@ewels @sven1103
True, might really make sense to have both beginning and end.
The metadata are complete on start, why they should be sent at the end ?
For example: workflow completion date-time, duration, success flag, exit status
+1
ok, implemented and works. Still need to gather the params
Example output as appetizer
"metadata": {
"start": "2019-03-20T13:31:40+0000",
"projectDir": "/Users/sven1103/.nextflow/assets/nf-core/hlatyping",
"manifest": {
"nextflowVersion": ">=18.10.1",
"defaultBranch": "master",
"version": "1.1.4",
"homePage": "https://github.com/nf-core/hlatyping",
"gitmodules": null,
"description": "Precision HLA typing from next-generation sequencing data.",
"name": "nf-core/hlatyping",
"mainScript": "main.nf",
"author": null
},
"complete": "2019-03-20T13:32:36+0000",
"profile": "docker,test",
"homeDir": "/Users/sven1103",
"workDir": "/Users/sven1103/git/nextflow/work",
"container": "nfcore/hlatyping:1.1.4",
"commitId": "4bcced898ee23600bd8c249ff085f8f88db90e7c",
"errorMessage": null,
"repository": "https://github.com/nf-core/hlatyping.git",
"containerEngine": "docker",
"scriptFile": "/Users/sven1103/.nextflow/assets/nf-core/hlatyping/main.nf",
"userName": "sven1103",
"launchDir": "/Users/sven1103/git/nextflow",
"runName": "elated_murdock",
"configFiles": [
"/Users/sven1103/.nextflow/assets/nf-core/hlatyping/nextflow.config"
],
"sessionId": "2a45ef7d-6dc8-4cbc-a51c-f74483ded0c9",
"errorReport": null,
"scriptId": "2902f5aa7f297f2dccd6baebac7730a2",
"revision": "master",
"exitStatus": 0,
"commandLine": "./launch.sh run nf-core/hlatyping -profile docker,test -with-weblog 'http://localhost:4567'",
"nextflow": {
"version": {
"minor": "03",
"major": "19",
"patch": "0-edge"
},
"build": 5114,
"timestamp": "20-03-2019 13:25 UTC"
},
"stats": {
"computeTimeFmt": "(a few seconds)",
"cachedCount": 0,
"cachedDuration": {
"days": 0,
"millis": 0,
"hours": 0,
"minutes": 0,
"seconds": 0,
"durationInMillis": 0
},
"failedDuration": {
"days": 0,
"millis": 0,
"hours": 0,
"minutes": 0,
"seconds": 0,
"durationInMillis": 0
},
"succeedDuration": {
"days": 0,
"millis": 37266,
"hours": 0,
"minutes": 0,
"seconds": 37,
"durationInMillis": 37266
},
"failedCount": 0,
"cachedPct": 0.0,
"cachedCountFmt": "0",
"succeedCountFmt": "6",
"failedPct": 0.0,
"failedCountFmt": "0",
"ignoredCountFmt": "0",
"ignoredCount": 0,
"succeedPct": 100.0,
"succeedCount": 6,
"ignoredPct": 0.0
},
"resume": false,
"success": true,
"scriptName": "main.nf",
"duration": {
"days": 0,
"millis": 55688,
"hours": 0,
"minutes": 0,
"seconds": 55,
"durationInMillis": 55688
}
},
"runId": "2a45ef7d-6dc8-4cbc-a51c-f74483ded0c9",
"event": "completed",
"runName": "elated_murdock",
"runStatus": "completed",
"utcTime": "2019-03-20T13:32:37Z"
@sven1103 I agree with the others here that it would make sense to send it at the workflow start, and at workflow finishing (regardless of whether it was successful or not).
@johandahlberg It will be now send when the workflow is started and when it is completed. In terms of failure, the success boolean property is false and the errorReport and errorMessage properties will have more detailed information. At least it works, when I intentionally break my pipeline :D
For example docker daemon not running:
...
"errorMessage": "docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.\nSee 'docker run --help'.",
...
Awesome work @sven1103 😁
Once the params are added then I think we should have pretty much everything we'll need 👍
@sven1103 cool! Looks very useful.
Ok, params sneak preview (nf-core/hlatyping):
"metadata": {
"params": {
"container": "nfcore/hlatyping:1.1.4",
"help": false,
"outdir": "results",
"bam": true,
"singleEnd": false,
"single-end": false,
"reads": "data/test*{1,2}.fq.gz",
...},
"workflow": {
"start": "2019-03-20T19:30:08Z",
"projectDir": "/Users/sven1103/.nextflow/assets/nf-core/hlatyping",
"manifest": {
"nextflowVersion": ">=18.10.1",
"defaultBranch": "master",
"version": "1.1.4",
"homePage": "https://github.com/nf-core/hlatyping",
"gitmodules": null,
"description": "Precision HLA typing from next-generation sequencing data.",
"name": "nf-core/hlatyping",
"mainScript": "main.nf",
"author": null
},
"complete": null,
"profile": "docker,test",
...
}
@rsuchecki workflow metadata will be send as JSON payload soon as well: #1077 . This issue is solved imo so I will close it :)
Most helpful comment
Ok, params sneak preview (nf-core/hlatyping):