Syndesis: Getting only first row from SQL connector

Created on 16 Oct 2018  ·  55Comments  ·  Source: syndesisio/syndesis

This is a...


[ ] Feature request
[ ] Regression (a behavior that used to work and stopped working in a new release)
[x] Bug report  
[ ] Documentation issue or request

The problem


Seems that we're only passing the first row of an result set down the exchange.

https://github.com/syndesisio/syndesis/blob/be740466d017a5d2c20ca84044f1583b734ea6a2/app/connector/sql/src/main/java/io/syndesis/connector/sql/customizer/SqlConnectorCustomizer.java#L71-L72

It might be because we wanted to split and process (map?) each row individually but I can't find how this would be done in the code.

This is quite an issue for API provider-based integrations and API's that would fetch more than one row from the database, say to implement GET /task endpoint.

To address this we should probably set the output data shape of the SQL connector to something like:

{
  "results": [
    {"col1":"data1"},
    {"col2":"data2"}
  ]
}

So this would mean change the dynamic data shape generation (meta) and change in the SqlConnectorCustomizer::doAfterProducer.

Expected behavior


Put all rows on the exchange.

Tasks involved / Steps to Reproduce

  1. create timer to log integration (logging body)
  2. add invoke SQL action, using sample db connection with select * from todo SQL statement
  3. insert two or more rows into the sampledb.todo table
  4. notice that only the first row is logged
cadiscussion cafeature closeverified grouconnector

Most helpful comment

I think we need to work on https://github.com/syndesisio/syndesis/issues/4337 to properly solve this. The idea is that the DataShape describe the shape of the elements of the collection and then we do add some metadata/tag to the DataShape definition to tell UI/DataMapper that the action produces a collection.

How this flag is consumed is not yet defined

All 55 comments

I think that there's a number of different things here to take into account that are not related to the sql connector only but let's take it as example:

  1. sql action used as "from": in this case subsequent steps receive one exchange per row
  2. sql action used as final "to": in this case the result is just ignored
  3. sql action used as intermediate "to": in this case the subsequent steps receive only the first row, everything else is discarded and this is what this issue is about so we can apply the same solution as the case "from" and we can split the result into multiple exchanges and that should be fine for most of the work-flows

Things get a little bit more complicated when the result of the route matters i.e. in case of an API Provider and there's no clear solution for this case, so far we have the following suggestions:

  1. as per #3896, add an option to enable/disable split
  2. as per #3896 comment we could perform some sort of aggregation A think to bear in mind is that a flow can be composed by multiple split so if the response matter, aggregation may need to happen at multiple level.

Whatever solution we find, I think that we need some sort of visualization about what a step can produces and we also need to introduce the Split/Aggregate Steps.

@zregvart @KurtStam @gaughan @gashcrumb what do you think ?

Nice and considerate summary @lburgazzoli :+1:, thanks!

I think this needs to be in the hands of the user with a strong helping hand from us. We should, perhaps based on the cardinality of the data shapes, suggest or add automatically (if it doesn't end up too clunky for the user) appropriate split/aggregate steps. The guiding principle behind this should be that we offer no surprises to the user. A SELECT * FROM table followed by API provider return path should return all rows in the API response, if the same query is followed by Gmail connector to send e-mail we should offer split and template. I think if we brainstorm a bit we can find rules to guide this and I think data shapes should give us clues on that. (science fiction?)

Not for 7.2 (just in case :) )
This is super interesting and adds more weight (as if we needed any) behind the need for split/join steps.
👍

So I guess this fix goes after this print correct ?

@TovaCohen Can we release note this as a limitation.

Should we convert this from cat/bug to cat/feature, as it works as we expect until the new information with split being used within integrations that expect the bulk of the payload (such as API provider)?

Sure, we can provide a release note about this. But what should it say? Some questions:

Does this note apply to only the provided PostgresDB connection, or does it apply to connections to all SQL databases?

If I am interpreting Luca's 1, 2, 3 summary correctly:
For a SQL DB connection that is the start connection, - "subsequent steps receive one exchange per row", which sounds to me like there is not a problem. Set me straight please. When a SQL connection is the start connection, is there a problem? If yes, what is the incorrect behavior that the user will see? Is there a workaround?

For a SQL DB connection that is the finish connection, "the result is just ignored" - what does this mean? what is the behavior that the user will see? Does a finish connection to a SQL DB successfully insert, update or delete one record? multiple records? If not, is there a workaround?

For a SQL DB connection that is in the middle of an integration, and this case includes API Provider operation flows, - "subsequent steps receive only the first row, everything else is discarded" - Is this correct for part of the release note:

When a SQL connection is in the middle of an integration, a SQL statement that is supposed to retrieve more than one record incorrectly returns only the first record in the set. Subsequent integration steps receive only that first record.

Is there a workaround for this?

What about SQL connections in the middle that insert, update or delete more than one record? Are they successful?
Thanks!

@zregvart sounds good

@TovaCohen I don't think this is an error requiring a workaround, rather it is: by design right now, with this feature enhancement pending.

Okay, then what does the release note need to say?

@TovaCohen We can just remove the finish connector detail, the rest looks good to me

On top of that, it’s by design for every connector that can return more than one entity. Twitter Search would be another example.

On Oct 22, 2018, at 4:31 PM, Gary Gaughan notifications@github.com wrote:

@TovaCohen https://github.com/TovaCohen I don't think this is an error requiring a workaround, rather it is: by design right now, with this feature enhancement pending.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/syndesisio/syndesis/issues/3856#issuecomment-431852727, or mute the thread https://github.com/notifications/unsubscribe-auth/AACK-CRrebbPcVSgEYldoeGJZLlRLfdHks5undbZgaJpZM4XeIZo.

I'm really sorry but I am not understanding what information you want to see in a release note. Could someone try writing the release note? And then maybe I can edit it? or ask questions? At this point, I don't know what to suggest or what to ask. And it almost sounds like an update to the user doc itself, and not just a release note, is what is required. But I really don't know :-(

@TovaCohen let's start with something like (bit clunky):

Connectors in Fuse Online work with the assumption that processing individual results is preferable to processing results in bulk. For instance when integrating Twitter with Salesforce, it'll process individual Tweets to create or update equal number of Salesforce Contact records. This is performed by automatically adding a splitting step that takes results of Twitter search, that might contain multiple Tweets, splits them to individual Tweets and processes each Tweet individually.
This assumption causes a side-effect of receiving only the first result, i.e. the first row, when a SQL database connection that returns more than one row from a SQL database is used in a integration utilizing an API provider operation.

@zregvart thanks, LGTM. @TovaCohen Does this work for you?

Thanks Zoran! It sounds like this applies to only API Provider operations that connect to SQL databases and that execute SELECT statements. If that's all correct, then I think this should be a note in the API Provider user doc, and not a release note. The API Provider is a Technology Preview feature, and its features are limited. This is just another limitation, just like the limitation that you must set one return code and there's no error handling. If you agree, then I would add this note to the API Provider doc, where it talks about adding connections to operation flows:

NOTE: In this Technology Preview release, a connection to a SQL database that executes a SELECT statement returns only the first record in the result set.

How's that?

Sounds good.

Well it does apply to (almost?) all connectors, but the most visible point is the API provider. I guess I'm thinking narrowly about this and I feel reaffirmed by the lack of issues citing that this is an bug -- so let's scope it with API provider.

This issue has been automatically marked as stale because it has not had any activity since 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions!

keep open

I am about to implement collection support for the SQL connector as part of the API provider collection support.

I intend to just return the whole result set as a collection instead of returning just the 1st row.

I am not sure what happens to the existing integrations in case we change the behavior of the customizer. Let's say you have an existing integration that is using the SQL connector to obtain a result set from DB in the middle of an integration. If we change the behavior to return the whole collection of results instead of returning just the 1st row what is going to happen (e.g to existing data mapping steps in that same integration after the SQL connector)?

I think I will just try that in a manual test to see how the integration would behave. Based on that we might need to think about integration migration when upgrading to Fuse 7.3.

So I put this to a test and to my amazement both versions

  1. SQL connector just returning the first result (what we have today)
  2. SQL connector returning the complete list of results

worked in the exact same way!

What I did:

  • Create an integration with the current implementation

    • timer -> sql -> data-mapping -> google-sheets

    • As expected the SQL connector just returned the first row and this was mapped and sent to Google Sheets

  • Update backend so SQL connector returns the complete result set as collection
  • Update Syndesis with new implementation
  • Restart the integration

    • The SQL connector returns the collection

    • Data mapper just maps the first entry and this is sent to Google Sheets

  • Exact same behavior as before!

Also published the integration once again and it also worked as before.

This is amazing work @christophd!

Is this due to Atlasmap ignoring the collection?

Is this due to Atlasmap ignoring the collection?

If it's so it's a feature we're very proud of :)

I'm guessing that either the connector or the integration runtime make enough provisions for both cases to be valid at the same time.

I'm guessing that either the connector or the integration runtime make enough provisions for both cases to be valid at the same time.

Not sure about this. The fact that it is working in both cases is only based on the actual data mapping behavior, I guess.

@zregvart totally a new feature!
@christophd sweet!

@christophd Looking at your comments in https://github.com/syndesisio/syndesis/issues/3856#issuecomment-458863787, what did you expect to see? The fact that it's "working the same way" doesn't seem to be right to me.

Looking at your comments in #3856 (comment), what did you expect to see?

I had no idea what was going to happen. Honestly I was expecting something to go wrong so I can then see how we can migrate old integrations to fix it.

Ah maybe it doesn't error out if the actual payload is unexpectedly JSON array, rather collect all of items in it. But I'm not sure it works perfectly end-to-end in any case... It's better to have it as a collection at design time.

I think we need to work on https://github.com/syndesisio/syndesis/issues/4337 to properly solve this. The idea is that the DataShape describe the shape of the elements of the collection and then we do add some metadata/tag to the DataShape definition to tell UI/DataMapper that the action produces a collection.

How this flag is consumed is not yet defined

So here's what I don't understand: why invent a flag to tell that the payload is a collection when the datashape can describe a collection?

Seems to me overly complicated to implement it like that.

The split and aggregate steps can both have datashapes where split's step is the datashape of the collection element and the aggregate step's datashape is a collection of the output shape of the step before it. Doesn't seem to me that complicated to accomplish. And we're with a flag we're increasing complexity and introducing more tech dept.

To separate the multiplicity info of document root from DataShape specification itself so that the specification could be same?
In any case, the final JSON schema ends up with having topmost array, but it's going to be wrapped dynamically with looking at that flag?

I feel we're blurring the line between the mapper and the integration runtime. I think there should be a clear separation and a defined contract. Seems to me that the datashape is that separation and contract and that we should solve all problems we face in that context.

@zregvart my rationale for such proposal is that I would like to avoid the UI to process the data shape definition to extract the shape of the split step or to wrap it again as a collection when aggregating to reduce the logic to put on the UI side (thus maintenance) but that's not written in stone and can be implemented as you suggested if easier/better.

@igarashitm if we go for a element+flag I would like to avoid wrapping the schema and having this supported natively by AtlasMap so all the logic is inside it and the only thing UI has to do is dealing with the flag.

No need for that to be on the UI side, the backend can freely change shapes as we do with dynamic data shapes for actions. Would be cool if we had this as helper functionality in Atlasmap (not saying that it needs to be implemented by Atlasmap team, this would be something Syndesis backend team could contribute).

Mh, how can the back-end change the data shape while the integration is being edited ? UI would need to set the shape of the split step according to the previous step to make atalasmap working properly (afaik) at design time.

Or UI need to invoke a back-end endpoint to do so.

A bit gray zone I feel, actually for Java documents, it's not feeding parameterized type like java.util.LinkedList<io.atlasmap.MyOrder> as it's not an official class name, rather feeding:

  • className: io.atlasmap.MyOrder
  • collectionType: CollectionType.LIST
  • collectionClassName: java.util.LinkedList

But JSON schema supports describing root element as an array, and we need to wrap with JSON array before inspection if it's topmost collection. However, who does that wrapping? either Syndesis or AtlasMap. Just a matter of "which one does".
In the initial discussion I think we concluded to do it in Syndesis meta.
But if there's any problem doing it in Syndesis, we can also implement that schema modification in AtlasMap inspection by providing that flag. It's going to be exact same logic, it needs to modify JSON schema and put the topmost JSON array right before inspection.

Not easy like a post-inspection tweak, as all the fields get their full path during inspection, /order/orderId vs. /<>/order/orderId

Mh, how can the back-end change the data shape while the integration is being edited ? UI would need to set the shape of the split step according to the previous step to make atalasmap working properly (afaik) at design time.

We already provide a data shape for dynamic actions, we can use the same principle.

Or UI need to invoke a back-end endpoint to do so.

Yeah, that's how it usually goes...

It might be controversial, but I'm OK with supporting just JSON for this.

And I think in the long run we might find it much easier to just convert everything to JSON internally and describe it as JSON, than worry about Java classes.

Either that or we need to re-visit how we define data shapes, if we're going to have to consider every type of data shape in every situation, we're probably missing a good model for describing data shapes.

I think I'm done trolling for today, thank you for letting me vent for a bit :100:

So for back-end we need:

  • implement step/aggregate meta data retrieval
  • DataShape definition need to be enriched to add additional information needed by the GenerateInspectionsMojo to properly generate the specification
  • maven connector plugin and maven extension plugin need to be updated
  • Update extension annotation

@zregvart do you want to take it ? :)

FWIW it's gonna be JSON only as Syndesis depends on offline inspection for Java. XML prohibit multiple root element.

DataMappers can live inside and outside of a split/aggregate block. Therefore the DataShape defined in the connector should not describe a collection as root because it really depends on where the actual mapping is performed.

Consider the following integrations:

integrationA: connectorA -> data-mapper -> connectorB
integrationB: connectorA -> split -> data-mapper -> connectorB -> aggregate

connectorA provides a collection and this is passed to the data-mapper as is in integrationA. In integrationB the same collection is explicitly split before the mapping so the data-mapper is provided with single elements of the same DataShape for mapping.

In my opinion both data-mappers here should be provided with the exact same DataShape information. A DataShape saying that it is of type JSON array would be wrong for the mapping in integrationB.

The fact that data-mapper is operating on a collection or a single element should be given by some kind of information other than the DataShape itself.

I see following options how this collection-type information could be given to the mapper:

  1. Explicitly given by the user when adding the mapper to the integration
  2. Evaluated by the integration runtime

Option 2 can become overly complicated as you could also have an integration like this:

integrationC: connectorA -> split -> sql -> data-mapper -> connectorB -> aggregate

Where the mapping is inside a split but the SQL returns a collection of results that should be passed to connectorB. So the mapping is acting on a collection coming from SQL within a split of the elements coming from connectorA.

So for back-end we need:
* implement step/aggregate meta data retrieval

This is the only thing I think we need to have

* DataShape definition need to be enriched to add additional information needed by the GenerateInspectionsMojo to properly generate the specification

Why?

* maven connector plugin and maven extension plugin need to be updated

Why?

* Update extension annotation

Why?

@zregvart do you want to take it ? :)

Sure, the first task doesn't seem like a lot of work, not sure about the other tasks, why do you think they're needed?

DataMappers can live inside and outside of a split/aggregate block. Therefore the DataShape defined in the connector should not describe a collection as root because it really depends on where the actual mapping is performed.

I think this assumes that the split/aggregate have no data shapes.

I'm suggesting that they do have data shapes. Then the split's output data shape needs to be the shape of the single element of a collection and the aggregate's output data shape needs to be the shape that describes a collection of the data shape of the step preceding it.

So for back-end we need:

  • implement step/aggregate meta data retrieval

This is the only thing I think we need to have

* DataShape definition need to be enriched to add additional information needed by the GenerateInspectionsMojo to properly generate the specification

Why?

* maven connector plugin and maven extension plugin need to be updated

Why?

* Update extension annotation

Why?

@zregvart do you want to take it ? :)

Sure, the first task doesn't seem like a lot of work, not sure about the other tasks, why do you think they're needed?

because when the type is a List, as @igarashitm said, we need to provide some additional info to the inspector mojo to make it working and this has effects to some other code.

We also need to rework the meta data retrieval for some connectors (i.e. sql, servicenow, salesforce) and the data shape definition for some others as well (i.e. twitter) as having a proper db migration task.

because when the type is a List, as @igarashitm said, we need to provide some additional info to the inspector mojo to make it working and this has effects to some other code.

Perhaps I didn't make myself clear, I think we should only support JSON for top level collections (in this iteration). If you agree with that the change should only affect the data shape in the connector descriptor and dynamic metadata extension.

I think we should give Java class inspection support a bit more time to mature, seems to me that we're jumping to the first possible solution by adding additional flag/properties.

We also need to rework the meta data retrieval for some connectors (i.e. sql, servicenow, salesforce) and the data shape definition for some others as well (i.e. twitter) as having a proper db migration task.

I'm acquainted better with SQL and Salesforce connectors, I think for them a change to JSON/JSON schema would be simple. I'm not familiar with ServiceNow and Twitter, though I don't think it should be that complicated.

Looks like there is something I’m missing: I do want to put a split step
after a twitter action, is that not going to be supported ?

On Thu, 31 Jan 2019 at 06:06, Zoran Regvart notifications@github.com
wrote:

because when the type is a List, as @igarashitm
https://github.com/igarashitm said, we need to provide some additional
info to the inspector mojo to make it working and this has effects to some
other code.

Perhaps I didn't make myself clear, I think we should only support JSON
for top level collections (in this iteration). If you agree with that the
change should only affect the data shape in the connector descriptor and
dynamic metadata extension.

I think we should give Java class inspection support a bit more time to
mature, seems to me that we're jumping to the first possible solution by
adding additional flag/properties.

We also need to rework the meta data retrieval for some connectors (i.e.
sql, servicenow, salesforce) and the data shape definition for some others
as well (i.e. twitter) as having a proper db migration task.

I'm acquainted better with SQL and Salesforce connectors, I think for them
a change to JSON/JSON schema would be simple. I'm not familiar with
ServiceNow and Twitter, though I don't think it should be that complicated.


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
https://github.com/syndesisio/syndesis/issues/3856#issuecomment-459305163,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AByEhSOnPmzSKZcsUFho-Y1RgHOexPJtks5vIs4qgaJpZM4XeIZo
.

>

--
Luca Burgazzoli

I think this assumes that the split/aggregate have no data shapes.
I'm suggesting that they do have data shapes.

@zregvart I had no idea that steps can have data shapes and that they can use dynamic meta data retrieval. I thought this was limited to connectors only. I could definitely use some help when we go that path.

So since Syndesis runs Java inspection offline, you can't have a dynamic DataShape for Twitter which has Java DataShape ATM. In order to achieve that, we need to enable Java online inspection in syndesis-meta with having all the Java dependencies (connector, libraries, extensions) in syndesis-meta classpath.

I wonder, if it's possible to store both of inspection result, twitter4j.Status and List<twitter4j.Status> and then Syndesis choose one of them along with action configuration?

If the stumbling block is supporting Java data shapes, I think we can do without them for the first iteration and then consider what to do about them later.

Perhaps having two data shapes per connector action, one holding the shape of the collection and one holding the shape of the element is a good idea at that point perhaps we can make provisions to make that simpler from the data shape of a collection. Let's just not block on this.

The following connectors definition have one or more java datashape so I don't think we can go ahead without having support for java ones:

  • aws-s3/src/main/resources/META-INF/syndesis/connector/aws-s3.json
  • fhir/src/main/resources/META-INF/syndesis/connector/fhir.json
  • gmail/src/main/resources/META-INF/syndesis/connector/gmail.json
  • google-calendar/src/main/resources/META-INF/syndesis/connector/calendar.json
  • google-sheets/src/main/resources/META-INF/syndesis/connector/sheets.json
  • salesforce/src/main/resources/META-INF/syndesis/connector/salesforce.json
  • servicenow/src/main/resources/META-INF/syndesis/connector/servicenow.json
  • slack/src/main/resources/META-INF/syndesis/connector/slack.json
  • telegram/src/main/resources/META-INF/syndesis/connector/telegram.json
  • timer/src/main/resources/META-INF/syndesis/connector/timer.json
  • twitter/src/main/resources/META-INF/syndesis/connector/twitter.json

Of course not every connector has a magic split defined but that is to show how relevant the java data shapes are.

Was this page helpful?
0 / 5 - 0 ratings