Azure-docs: Azure Cosmos DB with Databricks Upsert

Created on 27 Jun 2019  Â·  14Comments  Â·  Source: MicrosoftDocs/azure-docs

I have a cosmosDB account on Azure. I have inserted 10 rows with primary key "unique_ID" via databricks using spark connector "azure-cosmosdb-spark_2.4.0_2.11-1.3.4-uber.jar"

The cosmosDB container is set with unique_ID as unique key.
Multiple issues:

  1. To upsert next set of records with same unique_IDs but different field values, I am unable to do so successfully. The query I am using:

connectionConfig = {
"Endpoint" : "https://.documents.azure.com:443/",
"Masterkey" : "",
"Database" : "",
"preferredRegions" : "East US",
"Collection": "",
"Upsert" : "true"
}

data.write.format("com.microsoft.azure.cosmosdb.spark").options(**connectionConfig).save()

THIS FAILS!
Error: Writing-to-a-non-empty-table.

So, I use the next query:

data.write.format("com.microsoft.azure.cosmosdb.spark").mode("append").options(**connectionConfig).save()

This FAILS too!
Error: Unique index constraint violation.

Replacing mode "append" to "overwrite" gives exactly the same error message.

How do I go about updating my records ?

What I dont understand is, shouldn't overwrite at least work and overwrite the whole database with new records?

Any help would be appreciated.

Thanks!


Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

Pri2 assigned-to-author cosmos-dsvc doc-bug triaged

All 14 comments

Thanks for your comment. We are actively investigating and will get back to you shortly. Thanks for your patience.

@shrey-agarwal In order to better assist with your query, could you please share the document link you are referencing above?

@KranthiPakala-MSFT I used the following documentation (here).
The doc states a way to write on the cosmosDB, but no mention of "upsert" or insert into a non-empty cosmosDB (which apparently requires a different version of the command, surprisingly).

@shrey-agarwal Thank you for the additional detail. I have added the doc footer to this issue. We are looking into this further.

@shrey-agarwal Ae you using Cassandra API by chance? The following doc shows you how to perform an UPSERT action using Spark: Upsert data into Azure Cosmos DB Cassandra API from Spark (link)

@Mike-Ubezzi-MSFT No, my application does not require Cassandra API. For time being, it's a simple Cosmos DB (core SQL) that I have created on the Azure portal resources.

@shrey-agarwal Can you try the following:

// Write to Cosmos DB from the flights DataFrame import org.apache.spark.sql.SaveMode flights.write.mode(SaveMode.Overwrite).cosmosDB(writeConfig)

Where you are attempting to use:

data.write.format("com.microsoft.azure.cosmosdb.spark").mode("append").options(**connectionConfig).save()

This repository may pf be more assistance here: Azure Cosmos DB Spark Connector User Guide (link)

@shrey-agarwal Can you try the following:

// Write to Cosmos DB from the flights DataFrame import org.apache.spark.sql.SaveMode flights.write.mode(SaveMode.Overwrite).cosmosDB(writeConfig)

Where you are attempting to use:

data.write.format("com.microsoft.azure.cosmosdb.spark").mode("append").options(**connectionConfig).save()

This repository may pf be more assistance here: Azure Cosmos DB Spark Connector User Guide (link)

@Mike-Ubezzi-MSFT - The repository and example you are referring is for Scala language. The equivalent syntax for the same query in Python is the one I am using. (python does not have SaveMode module).

@shrey-agarwal Can you use Scala? And did you try the following:

flights.write.format("com.microsoft.azure.cosmosdb.spark").options(**writeConfig).save()

@shrey-agarwal Can you use Scala? And did you try the following:

flights.write.format("com.microsoft.azure.cosmosdb.spark").options(**writeConfig).save()

@Mike-Ubezzi-MSFT This is the exact same query I tried and put in my question initially. This query works when the container is empty. It doesn't work when there is data in the container. "Non-empty table error"

As for Scala, I will have to try moving my code logic and all the processing to Scala. This could take some time. I would have preferred Python solution. Were you able to replicate the bug?

This is out of my area of expertise. I am looking to find a solution for you but at this time your best bet is to take this to the Cosmos DB MSDN Forum (link). In the meantime, I am assigning this to the document content owner to review the code examples for accuracy. I have escalated this to the product group.

I would assume append should work as upsert is set to true, not sure if there is a different method to achieve the operation. Sent an email to Ram for more details.

@shrey-agarwal - this could be due to unique key constrain violation in the dataframe, if you are trying to ingest. Could you please reach out to me at [email protected] and I will follow up offline with you to get a repro to investigate further.

@shrey-agarwal please followup with Sri using the above contact, we will close this issue here but tract at through email.

please-close

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Favna picture Favna  Â·  3Comments

AronT-TLV picture AronT-TLV  Â·  3Comments

Ponant picture Ponant  Â·  3Comments

JeffLoo-ong picture JeffLoo-ong  Â·  3Comments

JamesDLD picture JamesDLD  Â·  3Comments