Thank you very much for your work Pandas people! I have the error pandas.errors.MergeError: can only asof on a key for left
when I run the code below:
df1.sort_values(['gpsLongitude','gpsLatitude'], inplace=True)
df2.sort_values(['x','y'], inplace=True)
res = pd.merge_asof(df1, df2, left_by=['gpsLongitude','gpsLatitude'], right_by=['x','y'] )
it seems that merge_asof
does not accept list of columns to merge on. Documentation seems to say that the left_on
argument is a "label" and not a list.
So maybe what I am asking is for an enhancement? It just seems weird that it can only merge on 1 column.
pd.show_versions()
commit: None
python: 3.5.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 36.6.0
Cython: None
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.3.1
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.0
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: 2.7.3.1 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
Very actual
I looked into this a little bit and by removing these checks, I was able to merge on multiple keys and it seems to work, also with direction
and tolerance
arguments. Here's my monkey patch code. However I'm not sure whether the results are correct/as you would expect. I think it may just use the last key in the list. I think that may be changed here but that results in an 'truth value of array is ambiguous' error down the line.
mmm the checks seem pretty clear that it can only work with 1 key, wonder why but I guess there must be a reason! If you look into the tests for merge_asof they all use just 1 key per side.
Hi,
I also got same problem when trying to select multiple keys. Great if we can have a fix for this.
I've just stumbled upon this issue as well. I wonder if it would be practical to allow for this?
Hi,
I have come across this issue as well. It will be great if we can fix for this.
+1 on this, I think it would be a great enhancement if possible.
how would this work? this seems like an impossible request
take the OP as an example: you need some way to trade off 'closeness' in one field and 'closeness' in the second if there are no exact matches.
I would suggest defining and calculating a common metric yourself. for example, the haversine distance between two lat/longs. of course, that would mean calculating the distance between every point possible, which kind of defeats the purpose of merge_asof
what might be more helpful is some combination of regular merge and asof, e.g. exact match on one pair of fields, nearest match on another
There is no need to calculate distance between EVERY point if one specifies tolerance criteria for separate merge keys. The common metric could be added as another parameter (function taking arbitrary number of arguments).
Since the asof merge requires pre-sorted data, wouldn't a multi-key merge be the same as merging on a single column of sorted tuples?
Most helpful comment
how would this work? this seems like an impossible request
take the OP as an example: you need some way to trade off 'closeness' in one field and 'closeness' in the second if there are no exact matches.
I would suggest defining and calculating a common metric yourself. for example, the haversine distance between two lat/longs. of course, that would mean calculating the distance between every point possible, which kind of defeats the purpose of merge_asof
what might be more helpful is some combination of regular merge and asof, e.g. exact match on one pair of fields, nearest match on another