In many cases, I want to collect several csv files that share the same structure. The ideal operator would skip their headers while collecting the files and then reattach the first header it encountered before emitting the collected file.
+1
plus one, some process like BLAST nr database, we can split FASTA and run separately. However, it would cost another process to remove redundant header.
👍
Had to do this more times than I'd care to admit :)
"""
(printf "${HEADER}") > out_temp
cat ${out} >> out_temp
"""
The only problem I see here is how to manage the splitting of many CSV in many chunks that may have different headers. What header is supposed to be applied in the resulting collected files? the first one any case or the last processed ?
Honestly I don't know. I am personally interested in the case where I'm collecting several CSV files that all have the same header. Perhaps a first idea could be to collect all the header, compare them and provide a warning in case they don't match.
A bit too complicated. I would go with the assumption they are equals and only the first is retained and applied to the collected files.
I'd love that feature!
And now, the most difficult question: keepHeader or keepHeaders ? 😄
keepHeader of course! We want to preserve one header after all..😉
I would use keepHeader :) But then you have to specify either the number
of rows or the presence of a character able to identify the header (like #)
L
On 17/11/2017 17:52, Paolo Di Tommaso wrote:
>
And now, the most difficult question: |keepHeaders| or |keepHeader| ? 😄
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/nextflow-io/nextflow/issues/479#issuecomment-345299176,
or mute the thread
https://github.com/notifications/unsubscribe-auth/APPSvbhDHemQ62ZD5qxP642EF_ygf_5bks5s3bnRgaJpZM4P1HeT.
keepHeader won.
you have to specify either the number of rows or the presence of a character able to identify the header
Not sure to understand
Examples:
Name Surname
Luca Cozzuto
Paolo Di Tommaso
So here the header is row 1
Again:
This is the the wonderful format made by some crazy bioinfo
Name Surname
Luca Cozzuto
Paolo Di Tommaso
So here the header is composed by rows 1-2
Finally
Luca Cozzuto
Paolo Di Tommaso
So here the variable header is described by a prefix (#).
PS: I hope there won't be a combination of the two situations :)
L
On 17/11/2017 17:57, Paolo Di Tommaso wrote:
>
|keepHeader| won.
you have to specify either the number of rows or the presence of a character able to identify the headerNot sure to understand
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/nextflow-io/nextflow/issues/479#issuecomment-345300816,
or mute the thread
https://github.com/notifications/unsubscribe-auth/APPSvYNjydxUusxbKn3QeQHUb2gNCKd1ks5s3bsRgaJpZM4P1HeT.
ummmmmmmmmmm, multi-line headers !
I think I'm going to the beer session ..
Available for test in version 0.27.0-SNAPSHOT.
This was more tricky than expected. You guys owe me a 🍺 or a ☕️ at your choice. 😄
Most helpful comment
This was more tricky than expected. You guys owe me a 🍺 or a ☕️ at your choice. 😄