Pandas: merge_asof on multiple columns?

Created on 15 Mar 2018  路  10Comments  路  Source: pandas-dev/pandas

Thank you very much for your work Pandas people! I have the error pandas.errors.MergeError: can only asof on a key for left when I run the code below:

df1.sort_values(['gpsLongitude','gpsLatitude'], inplace=True)
df2.sort_values(['x','y'], inplace=True)

res = pd.merge_asof(df1, df2, left_by=['gpsLongitude','gpsLatitude'], right_by=['x','y'] )

Problem description

it seems that merge_asof does not accept list of columns to merge on. Documentation seems to say that the left_on argument is a "label" and not a list.

So maybe what I am asking is for an enhancement? It just seems weird that it can only merge on 1 column.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 36.6.0
Cython: None
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.3.1
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.0
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: 2.7.3.1 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Enhancement Reshaping

Most helpful comment

how would this work? this seems like an impossible request

take the OP as an example: you need some way to trade off 'closeness' in one field and 'closeness' in the second if there are no exact matches.

I would suggest defining and calculating a common metric yourself. for example, the haversine distance between two lat/longs. of course, that would mean calculating the distance between every point possible, which kind of defeats the purpose of merge_asof

what might be more helpful is some combination of regular merge and asof, e.g. exact match on one pair of fields, nearest match on another

All 10 comments

Very actual

I looked into this a little bit and by removing these checks, I was able to merge on multiple keys and it seems to work, also with direction and tolerance arguments. Here's my monkey patch code. However I'm not sure whether the results are correct/as you would expect. I think it may just use the last key in the list. I think that may be changed here but that results in an 'truth value of array is ambiguous' error down the line.

mmm the checks seem pretty clear that it can only work with 1 key, wonder why but I guess there must be a reason! If you look into the tests for merge_asof they all use just 1 key per side.

Hi,

I also got same problem when trying to select multiple keys. Great if we can have a fix for this.

I've just stumbled upon this issue as well. I wonder if it would be practical to allow for this?

Hi,

I have come across this issue as well. It will be great if we can fix for this.

+1 on this, I think it would be a great enhancement if possible.

how would this work? this seems like an impossible request

take the OP as an example: you need some way to trade off 'closeness' in one field and 'closeness' in the second if there are no exact matches.

I would suggest defining and calculating a common metric yourself. for example, the haversine distance between two lat/longs. of course, that would mean calculating the distance between every point possible, which kind of defeats the purpose of merge_asof

what might be more helpful is some combination of regular merge and asof, e.g. exact match on one pair of fields, nearest match on another

There is no need to calculate distance between EVERY point if one specifies tolerance criteria for separate merge keys. The common metric could be added as another parameter (function taking arbitrary number of arguments).

Since the asof merge requires pre-sorted data, wouldn't a multi-key merge be the same as merging on a single column of sorted tuples?

Was this page helpful?
0 / 5 - 0 ratings