Nextflow: Manifest data in weblog message

Created on 15 Mar 2019 · 29Comments · Source: nextflow-io/nextflow

Manifest data in weblog message

The weblog message content should also provide the manifest information on workflow submit, such that it can be used for remote database logging.

Usage scenario

When you want to do remote logging of your workflows persistently in a database, and you need to relate the workflow manifest with the trace data.

Suggest implementation

Fetch the manifest object from the Session object as Map Session.manifest.toMap() in the WebLogObserver class. Add a property manifest in the JSON message, if the manifest data is provided.

Source

sven1103

👍1

Most helpful comment

Ok, params sneak preview (nf-core/hlatyping):

"metadata": {
     "params": {
            "container": "nfcore/hlatyping:1.1.4",
            "help": false,
            "outdir": "results",
            "bam": true,
            "singleEnd": false,
            "single-end": false,
            "reads": "data/test*{1,2}.fq.gz",
...},
   "workflow": {
            "start": "2019-03-20T19:30:08Z",
            "projectDir": "/Users/sven1103/.nextflow/assets/nf-core/hlatyping",
            "manifest": {
                "nextflowVersion": ">=18.10.1",
                "defaultBranch": "master",
                "version": "1.1.4",
                "homePage": "https://github.com/nf-core/hlatyping",
                "gitmodules": null,
                "description": "Precision HLA typing from next-generation sequencing data.",
                "name": "nf-core/hlatyping",
                "mainScript": "main.nf",
                "author": null
            },
            "complete": null,
            "profile": "docker,test",
     ...
   }

sven1103 on 20 Mar 2019

🎉3

All 29 comments

@johandahlberg @apeltzer @ewels any suggestions to this? The current PR #1077 only sends the manifest data on workflow submission, which I think is sufficient, as you can store the manifest data with the uuid4 of the run and link further trace messages. What do you think?

sven1103 on 18 Mar 2019

👍2

Sounds great! Yes I think that’s probably sufficient.

If we’re sending this, why not just send the entire config output on the workflow start?

ewels on 18 Mar 2019

Sounds reasonable, though I don't know enough about the inner workings of nextflow to know the details of how this could best be achieved. :smile: I agree with @ewels that it might be useful to get the full config submitted at the start of the workflow.

johandahlberg on 19 Mar 2019

is there's a really need for that? I would prefer to follow a kiss approach.

pditommaso on 19 Mar 2019

I think that we will definitely need more information about the workflows - the workflow run time variables at least (eg. the directories being used and user etc). If it's just the manifest then I could imagine us having a nice webpage showing 50 different rnaseq pipelines running with no real information to differentiate between them. It could be nice to be able to show the input data used from params.input or something for example.

Because we expect people to implement different tools to monitor workflows using this, I would advocate just sending everything and then people can choose what to use. Shouldn't be much data still and should be pretty easy to serialise into a JSON object.

ewels on 19 Mar 2019

I see the rationale but, the full config can contains sensitive informations, eg. cloud security credentials, tokens and password as env vars, etc. therefore it would be required to strip all this information.

pditommaso on 19 Mar 2019

hmm, yep, ok. How about just params and workflow? I guess the sensitive stuff will be in executor and env scopes usually?

ewels on 19 Mar 2019

Hmm, not sure this is feasible but I agree that a restriction makes sense. I assume the same as Phil that an exclusion of the executor and env scopes could already suffice if properly documented? Could also make sure to mention that in the docs for the weblog feature that people shouldn't have params with sensitive info?

apeltzer on 19 Mar 2019

I agree, the information in params scope would be indeed beneficial for internal quality assessment procedures. Having it in the weblog message, makes it easy to have information relevant for reusability...

sven1103 on 19 Mar 2019

+1 for workflow which is supposed to hold all workflow metadata. Eventually also params however the latter may be incomplete because it would not reflect parameters set in the main script, therefore it could be confusing and it would be better to not include for now.

pditommaso on 19 Mar 2019

👍2

How much work is required to get the parameters set in the main script? I suspect that this will be a fairly high priority for weblog to be picked up.

ewels on 19 Mar 2019

Complexity index 9 out if 10. The problem here is the parameters in the
script are just variable assignments.

To evaluate them you need to run the full script.

The new modules syntax will introduce some changes that may allow to parse
them without running the full pipelines.

However it's not anytime soon.

On Tue, Mar 19, 2019, 17:58 Phil Ewels notifications@github.com wrote:

How much work is required to get the parameters set in the main script? I
suspect that this will be a fairly high priority for weblog to be picked up.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/nextflow-io/nextflow/issues/1076#issuecomment-474469605,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAx3SEhdJiriYMpxalJOnNqgC9FIzfHlks5vYRdHgaJpZM4b2i2B
.

pditommaso on 20 Mar 2019

@ewels @johandahlberg I have appended the WorkflowMetadata content now to the payload. When do you want this to have included (event types)?

sven1103 on 20 Mar 2019

Thanks for clarification @pditommaso !

@sven1103 I think once at the beginning of each workflow should suffice - or does it make more sense to send it upon successful (?) finalization of the job? Then we don't have to filter it afterward as it will only be there when a job finishes successfully... ?

apeltzer on 20 Mar 2019

To evaluate them you need to run the full script.

@pditommaso - but we're talking about a sending this weblog event after the workflow has started, so we're already running the full script here anyway?

ewels on 20 Mar 2019

@sven1103 @apeltzer - I think maybe both start and end? Some would be useful to have at the end, such as workflow.success which obviously only makes sense then. But most things would be most useful when the workflow first starts.

ewels on 20 Mar 2019

👍1

we're talking about a sending this weblog event after the workflow has started

ouch, I was stuck in the other config issue! yes, here params are valid when the execution starts

pditommaso on 20 Mar 2019

👍1

@ewels @sven1103
True, might really make sense to have both beginning and end.

apeltzer on 20 Mar 2019

The metadata are complete on start, why they should be sent at the end ?

pditommaso on 20 Mar 2019

For example: workflow completion date-time, duration, success flag, exit status

sven1103 on 20 Mar 2019

pditommaso on 20 Mar 2019

🎉2

ok, implemented and works. Still need to gather the params

sven1103 on 20 Mar 2019

Example output as appetizer

"metadata": {
        "start": "2019-03-20T13:31:40+0000",
        "projectDir": "/Users/sven1103/.nextflow/assets/nf-core/hlatyping",
        "manifest": {
            "nextflowVersion": ">=18.10.1",
            "defaultBranch": "master",
            "version": "1.1.4",
            "homePage": "https://github.com/nf-core/hlatyping",
            "gitmodules": null,
            "description": "Precision HLA typing from next-generation sequencing data.",
            "name": "nf-core/hlatyping",
            "mainScript": "main.nf",
            "author": null
        },
        "complete": "2019-03-20T13:32:36+0000",
        "profile": "docker,test",
        "homeDir": "/Users/sven1103",
        "workDir": "/Users/sven1103/git/nextflow/work",
        "container": "nfcore/hlatyping:1.1.4",
        "commitId": "4bcced898ee23600bd8c249ff085f8f88db90e7c",
        "errorMessage": null,
        "repository": "https://github.com/nf-core/hlatyping.git",
        "containerEngine": "docker",
        "scriptFile": "/Users/sven1103/.nextflow/assets/nf-core/hlatyping/main.nf",
        "userName": "sven1103",
        "launchDir": "/Users/sven1103/git/nextflow",
        "runName": "elated_murdock",
        "configFiles": [
            "/Users/sven1103/.nextflow/assets/nf-core/hlatyping/nextflow.config"
        ],
        "sessionId": "2a45ef7d-6dc8-4cbc-a51c-f74483ded0c9",
        "errorReport": null,
        "scriptId": "2902f5aa7f297f2dccd6baebac7730a2",
        "revision": "master",
        "exitStatus": 0,
        "commandLine": "./launch.sh run nf-core/hlatyping -profile docker,test -with-weblog 'http://localhost:4567'",
        "nextflow": {
            "version": {
                "minor": "03",
                "major": "19",
                "patch": "0-edge"
            },
            "build": 5114,
            "timestamp": "20-03-2019 13:25 UTC"
        },
        "stats": {
            "computeTimeFmt": "(a few seconds)",
            "cachedCount": 0,
            "cachedDuration": {
                "days": 0,
                "millis": 0,
                "hours": 0,
                "minutes": 0,
                "seconds": 0,
                "durationInMillis": 0
            },
            "failedDuration": {
                "days": 0,
                "millis": 0,
                "hours": 0,
                "minutes": 0,
                "seconds": 0,
                "durationInMillis": 0
            },
            "succeedDuration": {
                "days": 0,
                "millis": 37266,
                "hours": 0,
                "minutes": 0,
                "seconds": 37,
                "durationInMillis": 37266
            },
            "failedCount": 0,
            "cachedPct": 0.0,
            "cachedCountFmt": "0",
            "succeedCountFmt": "6",
            "failedPct": 0.0,
            "failedCountFmt": "0",
            "ignoredCountFmt": "0",
            "ignoredCount": 0,
            "succeedPct": 100.0,
            "succeedCount": 6,
            "ignoredPct": 0.0
        },
        "resume": false,
        "success": true,
        "scriptName": "main.nf",
        "duration": {
            "days": 0,
            "millis": 55688,
            "hours": 0,
            "minutes": 0,
            "seconds": 55,
            "durationInMillis": 55688
        }
    },
    "runId": "2a45ef7d-6dc8-4cbc-a51c-f74483ded0c9",
    "event": "completed",
    "runName": "elated_murdock",
    "runStatus": "completed",
    "utcTime": "2019-03-20T13:32:37Z"

sven1103 on 20 Mar 2019

🚀2

@sven1103 I agree with the others here that it would make sense to send it at the workflow start, and at workflow finishing (regardless of whether it was successful or not).

johandahlberg on 20 Mar 2019

👍1

@johandahlberg It will be now send when the workflow is started and when it is completed. In terms of failure, the success boolean property is false and the errorReport and errorMessage properties will have more detailed information. At least it works, when I intentionally break my pipeline :D

For example docker daemon not running:

...
"errorMessage": "docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.\nSee 'docker run --help'.",
...

sven1103 on 20 Mar 2019

Awesome work @sven1103 😁

Once the params are added then I think we should have pretty much everything we'll need 👍

ewels on 20 Mar 2019

❤1

@sven1103 cool! Looks very useful.

johandahlberg on 20 Mar 2019

🎉1

Ok, params sneak preview (nf-core/hlatyping):

"metadata": {
     "params": {
            "container": "nfcore/hlatyping:1.1.4",
            "help": false,
            "outdir": "results",
            "bam": true,
            "singleEnd": false,
            "single-end": false,
            "reads": "data/test*{1,2}.fq.gz",
...},
   "workflow": {
            "start": "2019-03-20T19:30:08Z",
            "projectDir": "/Users/sven1103/.nextflow/assets/nf-core/hlatyping",
            "manifest": {
                "nextflowVersion": ">=18.10.1",
                "defaultBranch": "master",
                "version": "1.1.4",
                "homePage": "https://github.com/nf-core/hlatyping",
                "gitmodules": null,
                "description": "Precision HLA typing from next-generation sequencing data.",
                "name": "nf-core/hlatyping",
                "mainScript": "main.nf",
                "author": null
            },
            "complete": null,
            "profile": "docker,test",
     ...
   }

sven1103 on 20 Mar 2019

🎉3

@rsuchecki workflow metadata will be send as JSON payload soon as well: #1077 . This issue is solved imo so I will close it :)

sven1103 on 5 Apr 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings