I started seeing ORA-03135 errors when doing bulk inserts. DBAs have set idle connections to be dropped after 60 seconds. Here are my connection pool settings:
poolAlias: 'deployLog_U_pool',
_enableStats: true,
user: dbconfig.auth.user,
password: dbconfig.auth.password,
connectString: dbconfig.connstr,
poolMax: 24,
queueTimeout: 0, // never terminate queued connection request,
stmtCacheSize: 10000
I have tried tuning the poolTimeout setting to less then, equal to and greater than 60 seconds with no luck. Right now I'm fixing it this way:
/**
* @todo Determine if there's a max retry attempts before terminating the retries
* @param {string} statements SQL statement
* @param {number} fromRow From row of the batch operation of the total records
* @param {number} toRow To row of the batch operation of the total records
* @param {Object} bindParams bind parameters for the SQL
*/
exports.updateBatchNew = function updateBatch(statements, fromRow, toRow, bindParams = {}){
return new Promise((resolve, reject) => {
const data = {};
createConnPool(displayPoolStats = false)
.then((pools) => {
const pool = pools[0];
data.pool = pool;
return pool.getConnection();
})
// .then((conn) => {
// return validateConnection(conn, data.pool);
// })
.then((conn) => {
console.info(`inserting records from ${fromRow} to ${toRow} at ${Date.now()}`);
return conn.execute(statements, bindParams, { autoCommit: true })
.then((results) => {
console.info(`Data from ${fromRow} to ${toRow} committed at ${Date.now()}. Rows affected: ${results.rowsAffected}. Closing connection...`);
resolve(`reporting table updated with rows from ${fromRow} to ${toRow}...`);
return conn.close();
})
.catch((err) => {
console.error(`unable to insert data rom ${fromRow} to ${toRow}`);
if (err.message.indexOf('ORA-03135: connection lost contact') > -1) {
console.error(err);
console.warn('Retrying...');
conn.close();
return updateBatch(statements, fromRow, toRow, bindParams);
}
reject(err);
return conn.close();
})
})
.catch((err) => {
reject(err);
});
});
}
What is your Node.js version? Is it 64-bit or 32-bit?
Node.js version: v8.9.4 (linux x64)
What is your node-oracledb version?
Node-oracledb version: 2.1.2
What is your Oracle client (e.g. Instant Client) version? Is it 64-bit or 32-bit? How was it installed? Where is it installed?
Oracle Client library version: 12.2.0.1.0
64bit client is installed. Installed as per installation instructions from http://www.oracle.com/technetwork/topics/linuxx86-64soft-092277.html#ic_x64_inst
What is your Oracle Database version?
Oracle Database version: 12.1.0.2.0
What is your OS and version?
Ubuntu Server 16.04.4 LTS
What is your compiler version? For example, with GCC, run gcc --version
gcc 5.4.0
What Oracle environment variables did you set? How exactly did you set them?
Added the following line to ~/.profile
export LD_LIBRARY_PATH=/opt/oracle/instantclient_12_2:$LD_LIBRARY_PATH
What is the PATH environment variable (on Windows) or LD_LIBRARY_PATH (on Linux) set to? On macOS, what is in ~/lib?
/opt/oracle/instantclient_12_2:$LD_LIBRARY_PATH
What exact command caused the problem (e.g. what command did you try to install with)? Who were you logged in as?
logged in as a sudo user
What error(s) you are seeing?
ORA-03135
@sdesilvagb Let me ask a few questions to clarify what's going on...
batchUpdate? How is it intended to be used? statements is a single SQL statement?fromRow and toRow don't seem to be used. Am I misunderstanding something?Overall, this looks like a function that's not yet fully implemented. Let's clear that up before continuing.
@dmcghan sure thing.
1. What is the overall goal of batchUpdate? How is it intended to be used?
batchUpdate execute a PL/SQL Associative array binds in promises. Basically I have a large set of data and I use batchUpdate to commit this data to db in batches. In this case it's 10000 records.
statements is a single SQL statement?
2. statements is a single SQL statement?
statements is a single PL/SQL script
DECLARE
TYPE crad_number_type IS TABLE OF NUMBER
INDEX BY PLS_INTEGER;
TYPE crad_shortstring_type IS TABLE OF VARCHAR2(20)
INDEX BY PLS_INTEGER;
TYPE crad_string_type IS TABLE OF VARCHAR(256)
INDEX BY PLS_INTEGER;
T_CIDS crad_number_type := :cids;
T_CRIDS crad_number_type := :crids;
T_EMAILS crad_string_type := :emails;
T_CONTIDS crad_number_type := :contids;
T_ATYPES crad_shortstring_type := :atypes;
T_ADATES crad_string_type := :adates;
T_LIDS crad_number_type := :lids;
BEGIN
FORALL i IN T_CIDS.FIRST..T_CIDS.LAST
INSERT INTO CEM.CAMPAIGN_CONTACT_ACTION_DETAIL (
CAMPAIGN_ID,
CAMPAIGN_RUN_ID,
CONTACT_EMAIL,
CONTACT_ID,
ACTION_TYPE,
ACTION_DATE,
LINK_ID
) VALUES (
T_CIDS(i),
T_CRIDS(i),
T_EMAILS(i),
T_CONTIDS(i),
T_ATYPES(i),
to_date(T_ADATES(i), 'MM/DD/YYYY, HH:MI:SS PM'),
T_LIDS(i)
);
END;
3. fromRow and toRow don't seem to be used. Am I misunderstanding something?
fromRow and toRow are basically used for logging purposes right now. I needed a way to show which batches committed and which failed.
4. Are you creating a pool each time the function is invoked? Or is there other logic that prevents this? I generally recommend creating pools outside of such flows so the code reads better.
createConnPool first checks to see if a pool exists with the same alias, if not it creates the pool.
@sdesilvagb Have you determined when you're getting the ORA-03135? Is it when you first get the connection or after you've been using it?
Can you show how you're calling this? It seems you must be passing different values for fromRow and toRow from somewhere else (some means of iterating all the rows). But your transaction is in updateBatch, which means if something goes wrong, you'll have a hard time fixing things. You'll have to find out what succeeded, what failed, and then redo just the failed rows.
I'd recommend implementing it as a single transaction that does the batching internally. Maybe it would look like this:
function updateBatch(statement, binds, batchArraySize) {
...
}
updateBatch(sql, myArray, 5000);
@dmcghan i see the ORA-03135 after I've been using the connections. I'm implementing this as a last stage in a promise chain. Here's what that looks like:
// some stuff happens
.then(() => {
const promises = [];
// data.flattenReportResults contains the total records I have to insert. typically greater than 10,000
const numPromises = getNumDownloadTimes(data.flattenReportResults.length, batchCommitSize);
let fromRow = 0;
let toRow = data.flattenReportResults.length < batchCommitSize ? data.flattenReportResults.length : batchCommitSize;
const sql = `DECLARE
TYPE crad_number_type IS TABLE OF NUMBER
INDEX BY PLS_INTEGER;
TYPE crad_shortstring_type IS TABLE OF VARCHAR2(20)
INDEX BY PLS_INTEGER;
TYPE crad_string_type IS TABLE OF VARCHAR(256)
INDEX BY PLS_INTEGER;
T_CIDS crad_number_type := :cids;
T_CRIDS crad_number_type := :crids;
T_EMAILS crad_string_type := :emails;
T_CONTIDS crad_number_type := :contids;
T_ATYPES crad_shortstring_type := :atypes;
T_ADATES crad_string_type := :adates;
T_LIDS crad_number_type := :lids;
BEGIN
FORALL i IN T_CIDS.FIRST..T_CIDS.LAST
INSERT INTO CEM.CAMPAIGN_CONTACT_ACTION_DETAIL (
CAMPAIGN_ID,
CAMPAIGN_RUN_ID,
CONTACT_EMAIL,
CONTACT_ID,
ACTION_TYPE,
ACTION_DATE,
LINK_ID
) VALUES (
T_CIDS(i),
T_CRIDS(i),
T_EMAILS(i),
T_CONTIDS(i),
T_ATYPES(i),
to_date(T_ADATES(i), 'MM/DD/YYYY, HH:MI:SS PM'),
T_LIDS(i)
);
END;`
for(let i = 0; i < numPromises; i++){
// optimise later
let cids = [], crids = [], emails = [], contids = [], atypes = [], adates = [], lids = [];
for(let j = fromRow; j < toRow; j++) {
const reportResult = data.flattenReportResults[j];
cids.push(parseInt(reportResult.campaignId));
crids.push(parseInt(reportResult.campaignRunId));
emails.push(reportResult.contactEmail);
contids.push(parseInt(reportResult.contactId));
atypes.push(reportResult.actionType);
adates.push(reportResult.actionDate);
lids.push(reportResult.linkId === null ? null : parseInt(reportResult.linkId));
}
const bindParams = {
cids: { dir: oracledb.BIND_IN, type: oracledb.NUMBER, val: cids },
crids: { dir: oracledb.BIND_IN, type: oracledb.NUMBER, val: crids },
emails:{ dir: oracledb.BIND_IN, type: oracledb.STRING, val: emails },
contids: { dir: oracledb.BIND_IN, type: oracledb.NUMBER, val: contids },
atypes: { dir: oracledb.BIND_IN, type: oracledb.STRING, val: atypes },
adates: { dir: oracledb.BIND_IN, type: oracledb.STRING, val: adates },
lids: { dir: oracledb.BIND_IN, type: oracledb.NUMBER, val: lids }
};
const promise = updateBatchNew(sql, fromRow, toRow, bindParams);
promises.push(promise);
fromRow = toRow
toRow = toRow + batchCommitSize < data.flattenReportResults.length ? toRow + batchCommitSize : data.flattenReportResults.length;
}
return Promise.all(promises);
})
@sdesilvagb Hmmm. That's not the "right" way to do a promise chain. The right way isn't much better, but you maintain control in the JS layer rather than throwing everything over the wall to libuv. I can show you how to do this with promises, but it's a lot better with async/await.
The version of Node.js that you're on supports async/await. Is there a reason you're not using it?
@sdesilvagb Here are some examples...
This is a bad way. Copy/paste this into a file an run it with Node.js. Note that the getRandomNumber is called the full number of iterations, regardless of whether or not there's an error.
const iterations = 5;
function getRandomNumber() {
return new Promise(function(resolve, reject) {
console.log('getRandomNumber called');
setTimeout(function() {
const randomValue = Math.random();
const error = randomValue > .9 ? true : false;
if (error) {
reject(new Error('Ooops, something broke!'));
} else {
resolve(randomValue);
}
}, 2000);
});
}
let promiseChain = [];
for (let x = 0; x < iterations; x += 1) {
promiseChain.push(getRandomNumber());
}
Promise.all(promiseChain)
.then(vals => {
console.log(vals);
console.log('All done!');
})
.catch(err => {
console.log(err);
});
Compare that to this example:
const iterations = 5;
function getRandomNumber() {
return new Promise(function(resolve, reject) {
console.log('getRandomNumber called');
setTimeout(function() {
const randomValue = Math.random();
const error = randomValue > .9 ? true : false;
if (error) {
reject(new Error('Ooops, something broke!'));
} else {
resolve(randomValue);
}
}, 2000);
});
}
let count = 0;
let promiseChain = Promise.resolve(); // Have to init the chain
for (let x = 0; x < iterations; x += 1) {
promiseChain = promiseChain // yes, this is a thing :)
.then(getRandomNumber)
.then(n => {
count += 1;
console.log(count, n);
});
}
promiseChain
.then(() => {
console.log('All done!');
})
.catch(err => {
console.log(err);
});
When you run this, you'll see that the getRandomNumber function is only called if the previous invocation doesn't throw an error.
You'll also note that this is sequential whereas the previous example was in parallel. With the driver, the connection acts as a serialization device, so the behavior will sequential no matter how it's written.
Okay, now see how this could be done with async/await...
const iterations = 5;
function getRandomNumber() {
return new Promise(function(resolve, reject) {
console.log('getRandomNumber called');
setTimeout(function() {
const randomValue = Math.random();
const error = randomValue > .9 ? true : false;
if (error) {
reject(new Error('Ooops, something broke!'));
} else {
resolve(randomValue);
}
}, 2000);
});
}
async function doWork() {
for (let x = 0; x < iterations; x += 1) {
let n = await getRandomNumber();
console.log(x + 1, n);
}
}
doWork()
.then(() => {
console.log('All done.');
})
.catch(err => {
count += 1;
console.log(count, err);
});
This is similar to the second example in that we maintain control in the JS layer. However, there are no awkward promise chains to reason about. We just put async in front of function and then we can use await in a loop! :)
Note that async functions return promises, so you have to plan for that.
@dmcghan no particular reason other than I鈥檓 more familiar with Promises than async/await. When you say maintain control in the JS layer, what do you mean exactly?
Also, thanks so much for your help. If you can illustrate what you mean with promises I鈥檇 really appreciate it.
@sdesilvagb
no particular reason other than I鈥檓 more familiar with Promises than async/await
That makes good sense. Async/await builds on top of promises. So, since you're familiar with promises, you're ready to make your life easier with async/await! :)
When you say maintain control in the JS layer, what do you mean exactly?
When you read/write files, do DNS lookups, and a variety of other things, you use libuv's thread pool. So let's say you needed to insert a million rows and you did this with 1 million calls to execute and Promise.all to control the flow. Now, what if the very first row errors out?
Well, Promise.all will short circuit, but there are still 999,999 operations queued up to execute. Of course, they will all fail, but we don't have a way of saying: "hey libuv, don't execute any of those queue operations" (queued because a connection can only do 1 thing at a time).
Does that help? By the way, we'll be talking more about this during our next Office Hours session, maybe you can join?
https://devgym.oracle.com/pls/apex/dg/office_hours/727
If you can illustrate what you mean with promises I鈥檇 really appreciate it.
I put the examples up just before your questions (in case you didn't see them).
@sdesilvagb One last thing, I wrote about async/await here.
Thanks for all your help @dmcghan. I now know how to do a sequential flow using Promise.all. But I think for this use case parallelization is the way to go especially considering the number of records we have to insert. Whether I execute is sequentially or parallelly (is that even a word??) I need to re-execute those statements that have failed. So I'll catch those errors and re-execute the batch.
At least now I understand the issue a bit more. Connections are returned from the pool in parallel, execute is called on those connections in parallel. However, if there is a net dropout between obtaining a connection and execution or if there's a delay longer that 60 sec (set by our DBAs to timeout idle connections), execute is called on a faulty connection thus resulting the ORA-03135
I will definitely join the next office hour sessions.
Some miscellaneous comments:
since the pool is shared resource there is a little synchronization when you get or release connections, so I wouldn't 100% agree with 'Connections are returned from the pool in parallel'. Whether you notice this latching will depend on load factors and expectations - and what you mean by parallel.
make sure you close connections that are not in use, and get new connections just before you use them. This will avoid your DBA's timeout biting you.
once you get an error like ORA-03135 you can close the connection and get a new one. The pool will do some clean up on the dead connection.
if you are loading large sets of records I'd recommend using Data Pump which is now part of Instant Client. My second choice would be Python cx_Oracle which has executeMany() to efficiently insert data. (I know @anthony-tuininga and @dmcghan have started discussing add this to node-oracledb; it is being tracked in https://github.com/oracle/node-oracledb/issues/614)
@sdesilvagb
I now know how to do a sequential flow using Promise.all
Not quite. You can't do sequential flows with Promise.all - it's out of your control and the async operations will typically be done in parallel. It's just that some async work, like multiple execute calls on a single connection, will be done sequentially.
But I think for this use case parallelization is the way to go especially considering the number of records we have to insert.
This statement made me revisit your code. I would typically write this kind of operation using a single connection to do all the work. A single connection gives me a single session and transaction so I can make the entire unit of work succeed or fail at the same time. But yes, you're basically getting multiple connections to process multiple batches at the same time.
What do you typically set batchCommitSize to? What are you setting UV_THREADPOOL_SIZE to? The default thread pool size is 4. If you're not increasing that, only 4 of your 24 connections will be able to work at the same time.
So I'll catch those errors and re-execute the batch.
I see you have retry logic, but what if the DB goes down during a retry - how would you sort that out? That's just something to ponder... A single transaction makes this simple.
@cjbj Thanks for providing that link on Data Pump, I didn't know it was bundled with 12.2 Instant Client - very cool! :)
@sdesilvagb While Data Pump isn't based on JavaScript/Node.js, it will almost certainly be the fastest way to get the data into the database. Maybe worth looking into. I'll see if I can put together a demo for the Office Hours session.
@dmcghan
What do you typically set
batchCommitSizeto? What are you setting UV_THREADPOOL_SIZE to?
The default thread pool size is 4. If you're not increasing that, only 4 of your 24 connections will be able to work at the same time.
batchCommitSize is currently set to 10,000 and UV_THREADPOOL_SIZE is set to 32
I see you have retry logic, but what if the DB goes down during a retry - how would you sort that out? That's just something to ponder... A single transaction makes this simple.
I'm executing this via Apache Airflow so if the db goes down it will retry the entire job after a certain delay.
I would typically write this kind of operation using a single connection to do all the work. A single connection gives me a single session and transaction so I can make the entire unit of work succeed or fail at the same time.
Doesn't this mean that subsequent execute statements will have to wait until the previous ones are committed? Since I'm utilizing an PL/SQL procedure I could commit all at once instead of committing them in batches. I'll give that a try.
@cjbj
once you get an error like ORA-03135 you can close the connection and get a new one. The pool will do some clean up on the dead connection.
I tried implementing a validation procedure just before I call the PL/SQL script like this with no luck:
function validateConnection(conn, pool) {
return conn.execute('SELECT 1 FROM dual')
.then((results) => {
return conn;
})
.catch((err) => {
return validateConnection(pool.getConnection(), pool);
});
}
I called this in a then() just after pool.getConnection()
@sdesilvagb
batchCommitSizeis currently set to 10,000 and UV_THREADPOOL_SIZE is set to 32
Okay, that looks good. What's the typical upper limit of data.flattenReportResults.length? How long does a single batch of 10k usually take?
I'm executing this via Apache Airflow so if the db goes down it will retry the entire job after a certain delay.
Hadn't heard of Airflow before... My point was really along the lines of: If you use more that one connection/transaction and some but not all of the work succeeds and commits before the database goes down, you likely couldn't just repeat the entire job because part of it already succeeded. This may or may not be an issue for your app.
Doesn't this mean that subsequent execute statements will have to wait until the previous ones are committed?
Yes, each batch would be sequential. You'd have to test to see what the overall performance impact would be.
I tried implementing a validation procedure just before I call the PL/SQL script like this with no luck
What does "no luck" mean? validateConnection will not work as written (the return isn't async). You could implement it like this:
function validateConnection(conn, pool) {
return new Promise((resolve, reject) => {
conn.execute('SELECT 1 FROM dual')
.then((results) => {
resolve(conn);
})
.catch((err) => {
return validateConnection(pool.getConnection(), pool);
});
});
}
Note that reject isn't called so this could result in a never-ending loop.
It would be easier to implement this with an async function (which returns a promise automatically):
async function validateConnection(conn, pool) {
const maxRetries = 10;
for (let x = 0; x < maxRetries; x += 1) {
try {
await conn.execute('SELECT 1 FROM dual');
return conn;
} catch (err) {
conn = await pool.getConnection();
}
}
throw new Error('Max retries exceeded');
}
With async functions, it's easy to add max retry logic and avoid recursion.
I haven't tested either of these functions, they are just examples.
What's the typical upper limit of data.flattenReportResults.length? How long does a single batch of 10k usually take?
typical upper limit is around 150,000. Although there were times I've seen it reach 400,000. I've only seen the error when there are about 400,000 records to process. Single batch of 10k usually takes a 10-20 seconds.
validateConnection will not work as written (the return isn't async)
Doesn't conn.execute returns a promise anyways? Why wrap it in another one?
Doesn't conn.execute returns a promise anyways? Why wrap it in another one?
Ah, I missed the first return. Yeah, that should work! 馃憖 What did you mean by "no luck"? Did you get valid connections that errored out during the transaction? Have you reviewed SQL*Net and RDBMS trace files for hints?
Although there were times I've seen it reach 400,000. I've only seen the error when there are about 400,000 records to process.
That's interesting. At 150,000 rows, you'd need 15 connections and threads, which you have enough of. But when you get to 400,000 you need 40 connections and threads, but you only have 24 connections (plenty of threads for that number of connections). So you end up throwing the work over to the thread pool queue hoping everything will work out. I've never seen good things happen in this case.
I recommend moving the queue to your code. The Async module has a simple queue function that you can use. Rather than do an example based on Data Pump today, I'll try to put together an example of using Async's queue to do something similar to what you're doing.
What did you mean by "no luck"? Did you get valid connections that errored out during the transaction?
I meant even after validation I was still getting 3135 errors. I was getting valid connections but those valid connection stayed on the queue for more than 60 seconds I'm guessing.
So you end up throwing the work over to the thread pool queue hoping everything will work out. I've never seen good things happen in this case.
Haha good point. Is requests queueing also managed by libuv?
@sdesilvagb Not sure I'll be able to get the demo done in time... :( But I can finish after. Question for you: where do the rows come from - a network call or the file system?
Is requests queueing also managed by libuv?
We do request queueing in the JS layer with a custom queue. The default is to wait for 60 seconds to get a connection. But if that was what timed out you'd get an NJS-040 error.
where do the rows come from - a network call or the file system?
Via a network API call @dmcghan
@dmcghan You were right about using one connection. There was negligible difference in using one connection vs using multiple connections to execute data in batches. Thanks again for your help.
Is this fixed now?
@mohawk2 Are you getting intermittent ORA-03135 errors? If so, please create a new issue and tell us about your situation.
@sdesilvagb Just curious, are you getting intermittent ORA-03135 errors now that you're only using one connection?
@dmcghan Nope. At least I haven't run into them since I started using one connection.
My question was based on looking at this as part of a scan of Oracle-interfaces in other languages, and seeing this one looked like it was fixed. If the OP (@sdesilvagb) considers it fixed, they could close it?
Consider it closed.
@sdesilvagb node-oracledb 2.2 introduces connection.executeMany() (also see Batch Statement Execution) making batch data insert more efficient.
@cjbj Excellent! Does this offer any advantages over executing batches via PL/SQL associative arrays? I'm assuming this feature is an abstraction on top of that.
It definitely offers advantages over executing batches via PL/SQL associative arrays and this new feature is not an abstraction on top of that. Using PL/SQL associative arrays was a workaround. The two primary advantages I can think of immediately are that types are not restricted (associative arrays cannot handle all types) and that the data doesn't need to be re-arranged into columnar format. The performance difference should be negligible in most cases, but the fact that the PL/SQL engine doesn't need to be invoked may also prove an advantage. I guess another advantage is that you don't have to obscure your code by writing a PL/SQL wrapper around it, too!
Hope that's of some help.
Most helpful comment
Consider it closed.