Please explain here below what you were doing when the issue happened
We tested this with two separate CARTO accounts configured in Superadmin to use Here.com geocoding today and symptoms were identical.
We had a spreadsheet (can share if needed) that among other columns has an address and city column of store locations in the USA. We uploaded this by dragging and dropping onto the browser window ("Maps" page). The file was uploaded, automatically geocoded, and the Builder map created. We grew skeptical when we saw points in New York and Alabama that should have been in Tennessee and I investigated further.
I then took the same file, un-checked the "Let CARTO automatically guess data types and content on import." check-box, and uploaded it. This time, presumably because the auto-guess box was unchecked, the file was imported and not geocoded, the_geom column as null for all records. I then created a map with this 2nd file in Builder, applied the Georeference analysis with the appropriate columns, and spot-checked the results. All locations that were supposed to be in Tennessee were, in fact, this time, in Tennessee as expected.
Working conclusions:
Even if your account is configured to use HERE geocoding as the provider, if you upload a spreadsheet file and the “Let CARTO automatically guess data types and content on import.” check box is clicked when you do, and your file has some of our auto-recognized column names like address, your file will be automatically geocoded using _Mapzen_ and not _Here_, and will therefore most likely be poor quality. The work-around is to upload your file, de-select that “automatically guess” checkbox, and apply a Georeference analysis in Builder. That is the only way you can be sure your data will be geocoded with Here.
Please break down here below all the needed steps to reproduce the issue
address column and "auto-guess" checkbox clickedPlease describe here below the current result you got
I can't be certain (as we don't provide metadata yet per @kevin-reilly 's #12371 ), but I'm confident beyond a reasonable doubt that this file is being auto-geocoded with Mapzen, even though the superadmin setting is "heremaps":

Please describe here below what should be the expected behaviour
I would think/hope any account configured to use Here for geocoding would use Here, in all contexts. One possible exception to this might be our geocoding "search box" that appears on maps, which I know is 100% Mapzen across the board for all accounts, but that there is perhaps another business decision we should reconsider too.
cc @rochoa @rafatower
The automatic geocoding always happen with the internal geocoder, no external provider.
https://github.com/CartoDB/cartodb-management/wiki/Guessing-of-named-places
https://carto.com/docs/carto-engine/import-api/importing-geospatial-data#import-guessing
I don't think anything related to providers belongs in this repository.
Woah. I forgot about the internal geocoder. Did not realize we have automatic geocoding of place names. (Pretty dangerous without transparency of that to the user - another reason for metadata in #12371 !) This does explain why my point with Jackson, Tennessee (a small city) location was geocoded to Jackson, Mississippi (a large city, probably outranked in the internal geocoder database).
The CSV had columns (this is an semi-anonymized excerpt) :
id | address | city | st | zip_code
-- | ------- | ---- | -- | --------
1 | 136 Stonebrook Pl | Jackson | TN | 38305
2 | 7689 Poplar Ave. | Germantown | TN | 38138
3 | 349 Brentwood Pike Rd | Brentwood | TN | 37027
4 | 830 James Campbell Blvd South | Columbia | TN | 38402
5 | 314 Paul Huff Pkwy NW | Cleveland | TN | 37312
6 | 728 Thompson Lane | Nashville | TN | 37204
7 | 4435 Summer Ave | Memphis | TN | 38122
They were automatically geocoded to the centroids of:
This means that automatic geocoding of place names is substantially biased toward larger/more populated places. _Even_ if files have fully detailed addresses with states and postal codes and so on to clarify where to geocode, the automatic guessing will place them in the largest city with that city name. This is a huge methodological error because place names are not unique, not even within countries:
https://en.wikipedia.org/wiki/List_of_the_most_common_U.S._place_names
https://en.wikipedia.org/wiki/List_of_popular_place_names
I noticed that even though the source dataset the_geom column has been permanently saved via the automatic geocoding on import, I was able to apply the Georeference analysis in Builder and the points were properly geocoded (all to Tennessee in my sample).
Given that information, this issue is incorrect in its description, but still fairly serious. Without some kind of metadata, or transparency of "automatic geocoding" and which fields it is based on to the user, or other user notification of what's happening, this is a really confusing "feature". @rafatower , would you prefer I closed this and posted a revised issue in https://github.com/CartoDB/data-services instead? Though to be honest this one may be more than a simple issue and need input from UI/UX Design, etc...
internal geocoder is geocoding place names, not addresses, that's why the result is that. We guess data in columns and if we detect country names or place names (for example) with not a lot of duplicates and so on, we basically geocode it for free (that's why we use the internal geocoder).
I can come up with two solution:
I have some other proposals:
Some remarks:
Metadata: for a given query instead of just returning a geometry, we'd need to return several other things. That means changes in the API but also changes in the UX. E.g: is the metadata to be added to the columns of the analysis? do users want to have results below X accuracy or Y accuracy?
Internal geocoder: we all know it can be improved. Are we willing to prioritize work on that? It is the provider of the content guessing. It has its limitations but it let us do some stuff without incurring in some costs.
@saleiva and @kevin-reilly I beg you, please: if you really want the situation to improve we'd need a proper feature doc with high level requirements broken down into smaller requirements with a guarantee of consistency and completeness. And then prioritize the feature.
Another important disclaimer: the more features we add, the slower the process will be.
It'd be interesting to understand how useful the "guessing" is for customers. IMO the metadata about the geocode is interesting (even if we just store those results for our own use) because it opens the door to finding out how accurate results are (and how useful if they are not very accurate)
FYI, just last week I experienced exact same problem uploading a dataset. The auto geocoding done with the auto-guess box checked produced a large error rate (20%+) of points in the wrong US state. Could this be a serious problem that customers may have and don't realize?
I believe @terrett101 had same problem just today trying to do some work for a new customer.
Yes, I've been running into this problem all day. My greater concern though is that customers have no idea that the automatic internal geocoder is looking at cities or places only, and not accurately geocoding street addresses when present in the dataset.
If a customer uploads a dataset which contains street addresses and see's the points plot to a map, they assume they're plotted accurately based on the locational information in the file. It's the same assumption I've been making until today, so I'm sure our customers are too.
I'm not opposed to either of Rafa's suggestions. Either removing the guess work by CARTO (unless perhaps there is a lat/lon in the dataset) or providing the user with georeference options as they import (if no geometry detected) would both be an improvement over the current workflow.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Most helpful comment
I have some other proposals:
Some remarks:
Metadata: for a given query instead of just returning a geometry, we'd need to return several other things. That means changes in the API but also changes in the UX. E.g: is the metadata to be added to the columns of the analysis? do users want to have results below X accuracy or Y accuracy?
Internal geocoder: we all know it can be improved. Are we willing to prioritize work on that? It is the provider of the content guessing. It has its limitations but it let us do some stuff without incurring in some costs.
@saleiva and @kevin-reilly I beg you, please: if you really want the situation to improve we'd need a proper feature doc with high level requirements broken down into smaller requirements with a guarantee of consistency and completeness. And then prioritize the feature.