I have a cosmosDB account on Azure. I have inserted 10 rows with primary key "unique_ID" via databricks using spark connector "azure-cosmosdb-spark_2.4.0_2.11-1.3.4-uber.jar"
The cosmosDB container is set with unique_ID as unique key.
Multiple issues:
connectionConfig = {
"Endpoint" : "https://
"Masterkey" : "
"Database" : "
"preferredRegions" : "East US",
"Collection": "
"Upsert" : "true"
}
data.write.format("com.microsoft.azure.cosmosdb.spark").options(**connectionConfig).save()
THIS FAILS!
Error: Writing-to-a-non-empty-table.
So, I use the next query:
data.write.format("com.microsoft.azure.cosmosdb.spark").mode("append").options(**connectionConfig).save()
This FAILS too!
Error: Unique index constraint violation.
Replacing mode "append" to "overwrite" gives exactly the same error message.
How do I go about updating my records ?
What I dont understand is, shouldn't overwrite at least work and overwrite the whole database with new records?
Any help would be appreciated.
Thanks!
⚠Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
Thanks for your comment. We are actively investigating and will get back to you shortly. Thanks for your patience.
@shrey-agarwal In order to better assist with your query, could you please share the document link you are referencing above?
@KranthiPakala-MSFT I used the following documentation (here).
The doc states a way to write on the cosmosDB, but no mention of "upsert" or insert into a non-empty cosmosDB (which apparently requires a different version of the command, surprisingly).
@shrey-agarwal Thank you for the additional detail. I have added the doc footer to this issue. We are looking into this further.
@shrey-agarwal Ae you using Cassandra API by chance? The following doc shows you how to perform an UPSERT action using Spark: Upsert data into Azure Cosmos DB Cassandra API from Spark (link)
@Mike-Ubezzi-MSFT No, my application does not require Cassandra API. For time being, it's a simple Cosmos DB (core SQL) that I have created on the Azure portal resources.
@shrey-agarwal Can you try the following:
// Write to Cosmos DB from the flights DataFrame
import org.apache.spark.sql.SaveMode
flights.write.mode(SaveMode.Overwrite).cosmosDB(writeConfig)
Where you are attempting to use:
data.write.format("com.microsoft.azure.cosmosdb.spark").mode("append").options(**connectionConfig).save()
This repository may pf be more assistance here: Azure Cosmos DB Spark Connector User Guide (link)
@shrey-agarwal Can you try the following:
// Write to Cosmos DB from the flights DataFrame import org.apache.spark.sql.SaveMode flights.write.mode(SaveMode.Overwrite).cosmosDB(writeConfig)Where you are attempting to use:
data.write.format("com.microsoft.azure.cosmosdb.spark").mode("append").options(**connectionConfig).save()This repository may pf be more assistance here: Azure Cosmos DB Spark Connector User Guide (link)
@Mike-Ubezzi-MSFT - The repository and example you are referring is for Scala language. The equivalent syntax for the same query in Python is the one I am using. (python does not have SaveMode module).
@shrey-agarwal Can you use Scala? And did you try the following:
flights.write.format("com.microsoft.azure.cosmosdb.spark").options(**writeConfig).save()
@shrey-agarwal Can you use Scala? And did you try the following:
flights.write.format("com.microsoft.azure.cosmosdb.spark").options(**writeConfig).save()
@Mike-Ubezzi-MSFT This is the exact same query I tried and put in my question initially. This query works when the container is empty. It doesn't work when there is data in the container. "Non-empty table error"
As for Scala, I will have to try moving my code logic and all the processing to Scala. This could take some time. I would have preferred Python solution. Were you able to replicate the bug?
This is out of my area of expertise. I am looking to find a solution for you but at this time your best bet is to take this to the Cosmos DB MSDN Forum (link). In the meantime, I am assigning this to the document content owner to review the code examples for accuracy. I have escalated this to the product group.
I would assume append should work as upsert is set to true, not sure if there is a different method to achieve the operation. Sent an email to Ram for more details.
@shrey-agarwal - this could be due to unique key constrain violation in the dataframe, if you are trying to ingest. Could you please reach out to me at [email protected] and I will follow up offline with you to get a repro to investigate further.
@shrey-agarwal please followup with Sri using the above contact, we will close this issue here but tract at through email.