Azure-docs: Azure Cosmos DB with Databricks Upsert

Created on 27 Jun 2019 · 14Comments · Source: MicrosoftDocs/azure-docs

I have a cosmosDB account on Azure. I have inserted 10 rows with primary key "unique_ID" via databricks using spark connector "azure-cosmosdb-spark_2.4.0_2.11-1.3.4-uber.jar"

The cosmosDB container is set with unique_ID as unique key.
Multiple issues:

To upsert next set of records with same unique_IDs but different field values, I am unable to do so successfully. The query I am using:

connectionConfig = {
"Endpoint" : "https://.documents.azure.com:443/",
"Masterkey" : "",
"Database" : "",
"preferredRegions" : "East US",
"Collection": "",
"Upsert" : "true"
}

data.write.format("com.microsoft.azure.cosmosdb.spark").options(**connectionConfig).save()

THIS FAILS!
Error: Writing-to-a-non-empty-table.

So, I use the next query:

data.write.format("com.microsoft.azure.cosmosdb.spark").mode("append").options(**connectionConfig).save()

This FAILS too!
Error: Unique index constraint violation.

Replacing mode "append" to "overwrite" gives exactly the same error message.

How do I go about updating my records ?

What I dont understand is, shouldn't overwrite at least work and overwrite the whole database with new records?

Any help would be appreciated.

Thanks!

Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

ID: 5c77b390-e483-d7d4-3ea8-813a296c9e89
Version Independent ID: 3b1e6865-894e-9958-3e6d-4b04bf40e892
Content: Connect Apache Spark to Azure Cosmos DB
Content Source: articles/cosmos-db/spark-connector.md
Service: cosmos-db
GitHub Login: @tknandu
Microsoft Alias: ramkris

Pri2 assigned-to-author cosmos-dsvc doc-bug triaged

Source

shrey-agarwal

All 14 comments

Thanks for your comment. We are actively investigating and will get back to you shortly. Thanks for your patience.

femsulu on 28 Jun 2019

@shrey-agarwal In order to better assist with your query, could you please share the document link you are referencing above?

KranthiPakala-MSFT on 28 Jun 2019

@KranthiPakala-MSFT I used the following documentation (here).
The doc states a way to write on the cosmosDB, but no mention of "upsert" or insert into a non-empty cosmosDB (which apparently requires a different version of the command, surprisingly).

shrey-agarwal on 28 Jun 2019

@shrey-agarwal Thank you for the additional detail. I have added the doc footer to this issue. We are looking into this further.

Mike-Ubezzi-MSFT on 29 Jun 2019

@shrey-agarwal Ae you using Cassandra API by chance? The following doc shows you how to perform an UPSERT action using Spark: Upsert data into Azure Cosmos DB Cassandra API from Spark (link)

Mike-Ubezzi-MSFT on 29 Jun 2019

@Mike-Ubezzi-MSFT No, my application does not require Cassandra API. For time being, it's a simple Cosmos DB (core SQL) that I have created on the Azure portal resources.

shrey-agarwal on 29 Jun 2019

@shrey-agarwal Can you try the following:

// Write to Cosmos DB from the flights DataFrame import org.apache.spark.sql.SaveMode flights.write.mode(SaveMode.Overwrite).cosmosDB(writeConfig)

Where you are attempting to use:

data.write.format("com.microsoft.azure.cosmosdb.spark").mode("append").options(**connectionConfig).save()

This repository may pf be more assistance here: Azure Cosmos DB Spark Connector User Guide (link)

Mike-Ubezzi-MSFT on 3 Jul 2019

@shrey-agarwal Can you try the following:

// Write to Cosmos DB from the flights DataFrame import org.apache.spark.sql.SaveMode flights.write.mode(SaveMode.Overwrite).cosmosDB(writeConfig)

Where you are attempting to use:

data.write.format("com.microsoft.azure.cosmosdb.spark").mode("append").options(**connectionConfig).save()

This repository may pf be more assistance here: Azure Cosmos DB Spark Connector User Guide (link)

@Mike-Ubezzi-MSFT - The repository and example you are referring is for Scala language. The equivalent syntax for the same query in Python is the one I am using. (python does not have SaveMode module).

shrey-agarwal on 3 Jul 2019

@shrey-agarwal Can you use Scala? And did you try the following:

flights.write.format("com.microsoft.azure.cosmosdb.spark").options(**writeConfig).save()

Mike-Ubezzi-MSFT on 3 Jul 2019

@shrey-agarwal Can you use Scala? And did you try the following:

flights.write.format("com.microsoft.azure.cosmosdb.spark").options(**writeConfig).save()

@Mike-Ubezzi-MSFT This is the exact same query I tried and put in my question initially. This query works when the container is empty. It doesn't work when there is data in the container. "Non-empty table error"

As for Scala, I will have to try moving my code logic and all the processing to Scala. This could take some time. I would have preferred Python solution. Were you able to replicate the bug?

shrey-agarwal on 3 Jul 2019

This is out of my area of expertise. I am looking to find a solution for you but at this time your best bet is to take this to the Cosmos DB MSDN Forum (link). In the meantime, I am assigning this to the document content owner to review the code examples for accuracy. I have escalated this to the product group.

Mike-Ubezzi-MSFT on 6 Jul 2019

👍1

I would assume append should work as upsert is set to true, not sure if there is a different method to achieve the operation. Sent an email to Ram for more details.

SnehaGunda on 23 Aug 2019

👍1

@shrey-agarwal - this could be due to unique key constrain violation in the dataframe, if you are trying to ingest. Could you please reach out to me at [email protected] and I will follow up offline with you to get a repro to investigate further.