Describe the bug
The SQL importer does not allow requests containing a LIMIT statement.
It is safe to forbid keywords like UPDATE or DROP, but LIMIT should be allowed
Mathieu
To Reproduce
Try to import data from a MySQL database containing a (table_name) table : connect to the database, and make a SELECT * FROM (table_name) LIMIT 10;
Current Results
What results occured or were shown.
Expected behavior
A clear and concise description of what you expected to happen or to show.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
OpenRefine (please complete the following information):
Datasets
If you are allowed and are OK with making your data public, it would be awesome if you can include or attach the data causing the issue or a URL pointing to where the data is.
If you are concerned about keeping your data private, ping us on our mailing list
Additional context
Add any other context about the problem here.
The LIMIT and OFFSET clauses are used internally in OR to restrict the number of rows in the preview and create project SELECT queries.
@tcbuzor yes we understand that - but users should still be allowed to set their own LIMITs and OFFSETs - the code should gracefully handle these (for instance, by replacing a user-provided LIMIT by a smaller LIMIT during preview).
I think it would be reasonable for LIMITs and OFFSETs to be set via a specific control if necessary rather than within a query (thinking this could make it easier to manage rather than having to parse LIMIT/OFFSET statements out of a user authored query)
@wetneb why will you allow the user to enter a limit and then "gracefully replace the limit" ?
Because they could still use a LIMIT that is much higher than the 100 records that is set to keep the Preview screen responsive.
My view would be that it should behave as follows
If they set a limit <100 we should use that for both preview and full data import
If they set a limit >100 we should use 100 for preview, and the user set limit for the full data import
For OFFSET I think we should use a user set value (defaulting to zero as currently) for both preview and full import
More broadly the problem with the current code is that it chunks queries manually with OFFSET and LIMIT when importing a project from a SQL聽query. This sort of chunking can be quite inefficient, depending on the query. I think it would be preferable to use the existing JDBC API for that, with setFetchSize and setMaxRows, to let the JDBC stream the query for us directly. In the case of PostgreSQL at least, this is much more efficient as the driver will use server-side cursors to stream the query.
Let's do it as @ostephens suggested. It make sense for me. Maybe we can optimise for different DB if the performance is a concern.
Oracle's driver also lets you set the cursor options, if you need examples (Coffeetable) https://docs.oracle.com/javase/tutorial/jdbc/basics/retrieving.html#rs_interface