If I have a pipeline like this:
// Start with a Channel of Sample IDs and associated files
Channel.from(["Sample1", "Sample1.txt"],
["Sample2", "Sample2.txt"],
["Sample3", "Sample3.txt"],
["Sample4", "Sample4.txt"],
["Sample5", "Sample5.txt"],
["Sample6", "Sample6.txt"])
.into{samples; samples2; samples3; samples4}
// Channel of sample ID pairs
Channel.from(["Sample1", "Sample2"],
["Sample3", "Sample4"],
["Sample6", "Sample4"])
.into{sample_pairs; sample_pairs2}
samples3.join(sample_pairs2)
.map { sample_ID, sample_file, pair_ID ->
return [ pair_ID, sample_ID, sample_file ]
}
.join(samples4)
.map { pair_ID, sample_ID, sample_file, pair_file ->
return [ sample_ID, sample_file, pair_ID, pair_file ]
}
.println()
The output looks like this:
./nextflow run main.nf
N E X T F L O W ~ version 0.28.0
Launching `main.nf` [cheeky_heyrovsky] - revision: 799d622fe4
[Sample1, Sample1.txt, Sample2, Sample2.txt]
[Sample3, Sample3.txt, Sample4, Sample4.txt]
instead of like this:
[Sample1, Sample1.txt, Sample2, Sample2.txt]
[Sample3, Sample3.txt, Sample4, Sample4.txt]
[Sample6, Sample6.txt, Sample4, Sample4.txt]
When joining in this case, Sample4 exists twice in the paired channel, and the second one is lost at the second join.
My current workaround is here: https://github.com/stevekm/nextflow-demos/blob/253b8e3ea8547d88f571e7c718e34696cf5b9be1/join-pairs/main.nf
However it is very cumbersome. I think this might be resolved by having the options for 'left outer join', 'right outer join', or 'full outer join', since it appears that only an inner join is being used.
Yes, currently it's indeed as a inner join. Using remainder: true it implements an full output join.
Would it be possible to get options for specifying the other types of joins?
It should be possible.
I have a problem with remainder: true. Not sure if it belongs here, but given this toy example:
left=[[a, 1], [a, 2], [b, 3]]
right=[[a, X], [b, Y]]
left.join(right) => [[a, 1, X], [b, 3, Y]]
left.join(right, remainder: true) => [[a, 1, X], [a, 2, null], [b, 3, Y]]
This is given that X and Y are files. Not sure if that is the reason. From the second operation I would have expected: [[a, 1, X], [a, 2, X], [b, 3, Y]]
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Most helpful comment
I have a problem with
remainder: true. Not sure if it belongs here, but given this toy example:This is given that X and Y are files. Not sure if that is the reason. From the second operation I would have expected:
[[a, 1, X], [a, 2, X], [b, 3, Y]]