Hi, a few days ago I wrote script that sends the real time events to Event Hub. I used Azure Durable Functions to implement parallelism, and in doing so I used partition_key to ensure events with same id is directed to same partition. For some reason I thought the key can be single number and thus I used int types as partition keys, and while sending I never had any error about it. Later I implemented another function App that listens to Event Hub, and triggered when new message is received in Even Hub, and while doing so I was getting the following error at each firing:
Unable to cast object of type 'System.Int32' to type 'System.String
At first I thought may be this is due to encoding of str files that send which took me quite a long time to figure out this was because my usage of int value as partition key due to which Event Hub backend could not generate partition key hashes.
I am wondering why no Exception is raised if int type is passed as partition_key argument at the time of sending message string, and may be this is bug to be considered.
@aliyevmirzakhan thanks for reporting this, @yunhaoling, @YijunXieMS can you take a look at this?
Hello @aliyevmirzakhan ,
May I first confirm which language and what version of the EventHub sdk are you using?
Because from the error information Unable to cast object of type 'System.Int32' to type 'System.String, it looks like you're using EventHub sdk of .NET, while this repo mainly focuses on Python SDKs.
I believe the issue is more likely somewhere in the azure event hubs binding for functions. I don't know the protocol well enough to definitively state if an integer is a valid partition key or not - but if my understanding is correct, the value gets transmitted all the way to the receiver, which would imply that it is.
Hi @aliyevmirzakhan. Would you be so kind as to help clarify a bit more about your scenario so that we can make sure to get the right folks involved to assist? It would be helpful to understand:
How is the event being published to Event Hubs? The .NET SDKs only allow a string to be used as a partition key when publishing events, so it would seem that the event is being published using some other means. Is it the Python SDK being used to publish the event?
Are the Azure Functions that you're using both Python, both .NET or a mix? Can you clarify which stack is generating the exception?
@johanste Yes the protocol supports string, int (and maybe other types) from our test.
We decided to make it a string for all languages back then. Python has type annotation and documentation to indicate it's a string like below.
:keyword str partition_key: ...
But we don't raise an error when a user passes an int. If we add the validation now, is this regarded as a break change?
What does the service support?
The service contract is not formally documented, to my knowledge. The closest that I'm aware of is the AMQP 1.0 in Azure Service Bus and Event Hubs protocol guide, which does not contain formal type information. The entry for the PartitionKey annotation simply links back to the Track Zero client property, which is typed as string.
In our design discussions with the service team, my recollection is that they requested string be used for both Partition Key and Partition Id. I'll ask @ramya-rao-a to confirm that is her understanding as well. For what it's worth, the rest of the languages for both Track One and Track Two are allowing only string.
What does the service support
Partition Key is set in the message annotations part of the AMQP message which is essentially a map i.e key-value pair. The type of the value is retained. Therefore, if user passes a non string value for partitionKey and the client passes it along as is, the service will receive the message with partition key as a non string value.
The service does not seem to have any issues with such a message.
We can follow up with the service team on exactly what they do here.
The error we are seeing here (and in https://github.com/Azure/azure-event-hubs-spark/issues/424) is on the receiving side where the client expects a string, but finds a non string value. We read again from the same message annotations part of the AMQP message which I believe does not get mutated and has the same value that the sending client had set.
This is a problem with interop across different clients.
While we can do our due diligence when sending to always ensure that partitionKey is set as string (JS SDK does this, we cast whatever user provides as partitionKey to string before setting it in the message annotations), we are not responsible for third party AMQP implementations out in the wild.
Since the service does not seem to have a problem reading and using a non string partitionKey, it would be interesting to see what we can do at the receiving end to solve this.
Hi I am using Python runtime for both sending messages to Event Hub, and receiving them in a Function App that is triggered by Event Hub.
Thank you, @aliyevmirzakhan.
Based on that, I think @johanste's initial assessment is likely. The Azure Functions infrastructure is based on .NET and the bindings are running in the context of that infrastructure using the .NET Event Hubs client rather than running in the context of the Python language worker and using the Python SDK.
The root cause would appear to be that the partition key is enforced as string by the Track One and Track Two client libraries for all languages other than Python. The question, in my opinion, comes down to "should we guarantee that events published from a language can be read by the other languages and across versions of the SDK?"
Assuming so, I believe the choice comes down to:
object value and, if so, does that include the track one libraries as well?As @ramya-rao-a mentions, so long as the service team does not have a service-specific reason to disallow, the first is probably our most promising starting point. The tricky part of that will be that the Azure Functions bindings are still using the track one Event Hubs client.
cc @serkantkaraca for input from the service team
Both track-0 and track-1 .NET clients explicitly cast partition key value from message-annotations-map into EventData.PartitionKey. Service doesn't do any validation on the delivered type and assumes sender library already conducted sanitary checks. This applies to every entry in the annotations map as far as I know.
Unfortunately the messaging AMQP guide that @jsquire pointed doesn't mandate types. We should have mentioned expected types there.
@serkantkaraca how is the partition_key used in the service to decide which partition an event is sent to when it's an int and when it's a str?
@YijunXieMS service ignores partition-key if it is not string type.
We are investigating this issue, and will report back with an update by 11/4/2020.
Thanks for reviving this topic @samuelkoppes
Addressing @jsquire's options below
Do we investigate whether there is a way to be looser with type checking and coerce non-string values to string in the non-Python libraries?
On the receiving side, we can either coerce non string values to string, or ignore the non string values altogether. I would prefer the ignoring approach to be in sync with what the service does and not mislead the user to think that their non string partition key was actually used by the service. We can also check if the service would consider doing this instead of the client i.e. remove the partition key from the message annotation since it is invalid.
Do we introduce a breaking change to the Python library to validate the type matches the documentation?
No, breaking change is not an option.
Do we leave the Python behavior as-is and consider a stronger warning in documentation about cross-language compatibility?
Yes, we should.
Do we introduce breaking changes to the other libraries to allow a generic object value and, if so, does that include the track one libraries as well?
No, breaking change is not an option.