We are using Stack Driver to log errors for Firebase functions. We unpredictably get the error in the title. Typically it happens the first to we try to log an error after a function re-deploys. Subsequent error log writes will go through to StackDriver without issue, but occasionally we'll get the error again.
We're using "@google-cloud/logging": "^1.0.2", and deployed via Firebase Functions.
Below is our module that implements the logging...
Anybody have any idea what is causing this?
const Logging = require('@google-cloud/logging');
// To keep on top of errors, we should raise a verbose error report with Stackdriver rather
// than simply relying on console.error. This will calculate users affected + send you email
// alerts, if you've opted into receiving them.
const logging = Logging();
// This is the name of the StackDriver log stream that will receive the log
// entry. This name can be any valid log stream name, but must contain "err"
// in order for the error to be picked up by StackDriver Error Reporting.
const logName:string = 'errors-fb-func';
// Enum of StackDriver severities
enum Severities {
ERROR = 500, // ERROR (500) Error events are likely to cause problems.
CRITICAL = 600, // CRITICAL (600) Critical events cause more severe problems or outages.
ALERT = 700, // ALERT (700) A person must take an action immediately.
EMERGENCY = 800 // EMERGENCY (800) One or more systems are unusable.
}
// Provide an error object and and optional context object
export function log(err:Error, logLevel:number=Severities.ERROR, user?:string): Promise<any> {
// https://cloud.google.com/functions/docs/monitoring/error-reporting#advanced_error_reporting
const FUNCTION_NAME = process.env.FUNCTION_NAME;
const log = logging.log(logName);
const metadata = {
// MonitoredResource
// See https://cloud.google.com/logging/docs/api/ref_v2beta1/rest/v2beta1/MonitoredResource
resource: {
// MonitoredResource.type
type: 'cloud_function',
// MonitoredResource.labels
labels: {
function_name: FUNCTION_NAME
}
},
severity: logLevel
};
const context:any = {};
if (user && typeof user === 'string') {
// ErrorEvent.context.user
context.user = user;
}
// ErrorEvent
// See https://cloud.google.com/error-reporting/reference/rest/v1beta1/ErrorEvent
let structPayload:any = {
// ErrorEvent.serviceContext
serviceContext: {
// ErrorEvent.serviceContext.service
service: `cloud_function:${FUNCTION_NAME}`,
// ErrorEvent.serviceContext.version
resourceType: 'cloud_function'
},
};
if (context) {
// ErrorEvent.context
structPayload.context = context;
}
structPayload.message = getMsgForError(err);
return writeLog(log, metadata, structPayload);
}
function getMsgForError(error:Error): string {
// https://cloud.google.com/functions/docs/monitoring/error-reporting#advanced_error_reporting
// ErrorEvent.message
if (error instanceof Error && typeof error.stack === 'string') {
return error.stack;
} else if (typeof error === 'string') {
return error;
} else if (typeof error.message === 'string') {
return error.message;
} else {
logFatalError(error, "Error message type not supported");
return "";
}
}
function writeLog(log:any, metadata:any, structPayload:any): Promise<any> {
console.log(metadata);
console.log(structPayload);
// Write the error log entry
return new Promise((resolve, reject) => {
try {
log.write(log.entry(metadata, structPayload), (error:any) => {
if (error) {
logFatalError(error);
reject(error);
}
resolve();
});
} catch(error) {
reject(error);
}
});
}
// Utility function to log error if Logger fails
function logFatalError(error:Error, msg?:string): void {
console.error(error, msg);
throw error;
}
// error, crtical, alert, emergency, accept an Error object
// And then set error.stack as the message
export function error(error:Error, user?:string): Promise<any> {
return log(error, Severities.ERROR, user);
}
export function critical(error:Error, user?:string): Promise<any> {
return log(error, Severities.CRITICAL, user);
}
export function alert(error:Error, user?:string): Promise<any> {
return log(error, Severities.ALERT, user);
}
export function emergency(error:Error, user?:string): Promise<any> {
return log(error, Severities.EMERGENCY, user);
}
@ofrobots ever run into anything like this before?
As an update, it seems to be more of an issue the first time a call is made to Stackdriver after a Firebase function is updated and restarted. Subsequent requests seem to go through ok. Although it does re-emerge sporadically. We really haven't been able to get to the bottom of it.
Unfortunately, this causes the functions to timeout and throw an "Unhandled Exception", which is not going to work for us in production. So unless we can resolve this issue, we will have to replace Stackdriver with another logger.
I too am hoping @ofrobots will have an idea. :-)
@deremer Can you provide the actual stack trace?
Here is one log entry where we see this...
{
insertId: "000000-6ca1de21-3818-4c66-a620-a8a02227e1a7"
labels: {
execution_id: "134720017564250"
}
logName: "projects/REDACTED/logs/cloudfunctions.googleapis.com%2Fcloud-functions"
receiveTimestamp: "2017-07-10T00:12:20.477133425Z"
resource: {
labels: {
function_name: "LotteryActions"
project_id: "REDACTED"
region: "us-central1"
}
type: "cloud_function"
}
severity: "ERROR"
textPayload: "{ Error: Endpoint read failed
at /user_code/node_modules/@google-cloud/logging/node_modules/grpc/src/node/src/client.js:569:15 code: 14, metadata: Metadata { _internal_repr: {} } } undefined"
timestamp: "2017-07-10T00:12:14.178Z"
}
I've been wondering if we're setting metadata incorrectly or something because of the metadata: Metadata { _internal_repr: {} } } undefined
But here's another one where it doesn't have that...
{
insertId: "000001-901cb19e-a4cb-4c78-a339-360408edb7d8"
labels: {
execution_id: "134720017564250"
}
logName: "projects/REDACTED/logs/cloudfunctions.googleapis.com%2Fcloud-functions"
receiveTimestamp: "2017-07-10T00:12:20.477133425Z"
resource: {
labels: {
function_name: "LotteryActions"
project_id: "REDACTED"
region: "us-central1"
}
type: "cloud_function"
}
severity: "ERROR"
textPayload: "Error: Endpoint read failed
at /user_code/node_modules/@google-cloud/logging/node_modules/grpc/src/node/src/client.js:569:15"
timestamp: "2017-07-10T00:12:14.179Z"
}
/cc @GoogleCloudPlatform/node-team @murgatroid99
The error indicates a TCP read error – which suggests that sporadically there are some network issues in the environment?
@swalkowski @jasonpolites: any ideas if this could be an issue on Functions?
@ofrobots Just so future readers are 100% clear on this... This code is being run as Firebase Functions, so it's 100% within the Google Cloud environment.
Note: the part with the metadata is probably irrelevant. That is trailing response metadata, which is often empty.
It seems like the gRPC client successfully establishes a TCP connection (if it didn't, the error would say "Connect failed"), then the first attempt to read from that TCP connection fails for some reason, and the request it was a part of failed.
It might help to see more details of the TCP error, but that doesn't propagate far enough up the stack. It is possible to see it (along with a torrent of other information) by setting the environment variables GRPC_TRACE=tcp GRPC_VERBOSITY=DEBUG. The trace information from that gets dumped to stderr, so if you have access to that, that could help (though it could also expose too much information).
@murgatroid99 any idea how to set an env var like that with Firebase Functions...or through the Cloud Function console?
Put it in runtime.config.json?
I am not familiar with Cloud Functions, but you should just be able to do process.env.GRPC_TRACE='tcp'; and process.env.GRPC_VERBOSITY=DEBUG'; in the JS file before loading the logging module.
We were experimenting with a few things and one thing we try is moving const logging = Logging(); inside the log() function. Since we've done that we haven't seen the error.
If ok, can we leave this open for a couple of days so that I can report back on if that fixed it or not?
If that is in fact the issue, then Firebase may want to update their docs: https://firebase.google.com/docs/functions/reporting-errors#importing_dependencies
Update: since we moved const logging = Logging(); inside the log() function, we have not seen this error.
So I believe we have enough sample data to declare that a fix. Now whether or not that TCP error should in fact happen, is up to you guys to determine :-)
I'm good to close this, but I'll let you guys at Google decide if it should be closed, since you probably want to get the Firebase guys to update their docs (https://firebase.google.com/docs/functions/reporting-errors#importing_dependencies) and their sample (https://github.com/firebase/functions-samples/blob/master/stripe/functions/index.js)
I also logged an issue pointing to this issue for firebase: https://github.com/firebase/functions-samples/issues/194
There are some related issues with other GCP node modules related to auth tokens expiring, which fits with the "fix" of moving initialization to function-scope, but does not fit with the "we only see if after re-deploy". On the other hand, network issues experienced at startup only might be related to slower-than-expected spool up of network resources, but that doesn't fit with the behavior of the "fix". More investigation on our (Google) end needed methinks.
For me 'Error: "Endpoint read failed"' is often reported in Stackdriver when I call the Spanner API from a GCP Function which has been erroneously deployed with local node_modules from my computer (instead of allowing GCP to create the dependencies).
We are deploying via Firebase and accoriding to this link, "If you are deploying through the Firebase CLI, the local node_modules folder will be ignored, and you must specify your dependencies in package.json"
I get the error occasionally even with the correct deps. Yesterday was especially bad. This morning the error has been replaced with '"Function execution took 242 ms, finished with status: 'connection error'"' and the last log before that is the call to database.runTransaction. I do some more investigation and post the results here
I have issues using GCP Functions and Spanner with JS client 0.7.1. I sometimes receive errors invoking GCP Functions through Postman or using the testing screen from within the GCP Functions web console. The same Function works fine if I invoke it locally during periods when errors are occurring with the remote code. This happens around 5-10% of the time but it is not consistent, sometimes there are periods of no errors. I am initializing the JS client using these instructions "https://cloud.google.com/spanner/docs/getting-started/nodejs/". At the moment the error seems to happen when the Function is cold i.e. after running it a few times with no errors, wait for 10min or so, run it again and see the errors, run it a few more times and see no errors.
The errors are:
database.run
Error: Endpoint read failed\",\"error_code\":14,\"query\":\"SELECT
Error: Session not found: projects/REDACTED/instances/lc-data/databases/lc-data-uat/sessions/AFZejKDAPJhREDACTEDYnFRwi7iMjT0_","error_code":5,"query":"SELECT
database.runTransaction
Error: Endpoint read failed","error_code":14
connection error'"
Thanks to all for explaining their situations in great detail. Please correct any mistakes I might make in my assumptions:
@google-cloud/* module instantiation code to within the functionnode_modules directory@8357753 -- you elaborated on your situation further in #2489, and a few others are involved in a discussion, so we'll pursue the specifics there.
@jasonpolites -- do you still think there needs to be an investigation? Without much experience with GCF, my best interpretation of the situation now is that GCF is doing some type of caching of either old code or stale gRPC connections. If that's the case, I don't think we can solve much from GCN. If that's not the case, any help getting me up to speed on how I might be looking at this wrong would be appreciated. Thanks!
@stephenplusplus FIY I'm currently experiencing the issue even instantiating within the function and deploying with Firebase. It happens every time after I redeploy my functions.
Something like:
const pubsub = require('@google-cloud/pubsub')
exports.endpoint = functions.https.onRequest((req, res) => {
const pubsubClient = pubsub()
// ...
pubsubClient.createTopic(name, (err, topic) => {
topic.publish({ .... })
}
})
Bit of a showstopper :/
cc @lukesneeringer @bjwatson I'm not sure how to proceed on this one.
I've also discovered that in our cloud functions write.log() is taking about 15s to log to StackDriver... If I revert to console.error it is almost instantaneous. I've zeroed the delay in precision on to the call of log.write. If I print to console just before that line and then just after it is routinely 10-15s.
We thought degraded performance of our functions was related to load, when in fact it seems to just be this method call...
@deremer: Does the execution of your cloud function completion callback depend upon the settling of the promise returned by your writeLog function?
@ofrobots generally our functions complete by returning a promise, and occasionally we do return the function log() as shown above which in turn returns the promise to write the log.
I should also note that we have since updated the writeLog function as shown below because we realized it returns a promise if no callback is provided...
function writeLog(syslog:any, metadata:any, structPayload:any): Promise<any> {
// Write the error log entry
return syslog.write(syslog.entry(metadata, structPayload));
}
@ofrobots To clarify more, in our function we were writing to the error log in some situations before continuing on to do other work. Our functions get very high traffic at certain times and we thought the increased latency was just the functions getting overloaded... but it turns out that the increased latency only affected calls that were trying to write to StackDriver...
As mentioned... log.write() takes 10-15s every time. So if a function was waiting for the promise to resolve before continuing it was just sitting there waiting.
@deremer 10-15s to settle the log.write promise is unexpected for me. Let me take a look into this.
No problem, please let me know if you want more info or to reach out to me directly. I modified our code to use console.error instead of our StackDriver logger and our spike today ran beautifully with nearly all of the latency spike removed... so more evidence that our problem was this log write.
One more question: Are there any other log.write operations in your cloud function? Or the only single write operation happens to be the very last thing that the function does?
@ofrobots We wrap log.write in the module posted at the top of this issue (note that it has evolved from what is posted, but generally the same), and then we import it as "Logger".
Throughout our code we do conditionally call Logger.error(error, userId) in some situations. So, yes it is possible that our function may write more than once in a single invocation. As far as I can tell, these always have the latency issue.
I should note that if we call Logger.error, etc. we always await it before continuing (we use typescript and webpack)
@ofrobots Also, I don't know if the excessive log write time is a different issue or not...happy to move it out to not pollute this issue.
I do believe it is a different issue. I have split it out here: https://github.com/GoogleCloudPlatform/google-cloud-node/issues/2544.
As for the original issue with tcp timeouts / endpoint read failures: those are due to the fact that cloud functions get aggressively throttled once you call the completion callback. This means that any background work still pending once you call the completion callback is not guaranteed to be able to execute.
The solution is to ensure that you make sure out outstanding work that you care about (e.g. log.write as above) is done before you call the completion callback.
FYI @timburks https://github.com/GoogleCloudPlatform/google-cloud-node/issues/2438#issuecomment-323404472
To all who can help;
Does anyone have a surefire way to reproduce this issue?
Without a clear path on how to reproduce, I'm stuck on how to resolve. This seems like an issue either with GRPC or the GCF environment. I don't believe there is much we can do from our library.
@lukesneeringer please advise and re-open if necessary.
More reports coming in, let's reopen until we have more certainty we cannot resolve this from our codebase.
@stephenplusplus I have pinned to version 0.14.0 of the PubSub client library and the issue described here https://github.com/GoogleCloudPlatform/google-cloud-node/issues/2728 does not occur so somewhere between that version and 0.14.5 CloudFunction support has broken. BTW issue #2438/#2489 are > 3 months old and PubSub was working a month ago so this may or may not be related. I think it is odd to close #2728 and and #2489 in preference to this issue because they all seem different to me.
If it helps I can test later versions of the PubSub client, just let me know which you want me to test
I have pinned to version 0.14.0 of the PubSub client library and the issue described here #2728 does not occur so somewhere between that version and 0.14.5 CloudFunction support has broken.
@callmehiphop Anything you can think of? I still think this will trace back to the problem that you originally identified @ChrisBartonLC (https://github.com/GoogleCloudPlatform/google-cloud-node/issues/2728#issuecomment-342305105) -- GCF + gRPC can be a problem, which is what this issue is meant to track.
Actually yes, there was one PR specifically (https://github.com/GoogleCloudPlatform/google-cloud-node/pull/2627) where we updated the gRPC version from 1.3 to 1.6 and started observing the gRPC channel state when opening streams.
In case anyone is waiting to hear back ;) It still is having these issues with cloud functions. When I use the same code but inside of an App Engine instance - no issues.
I was able to reproduce the issue after enabling the grpc debug output. My observation is that, if the channel is idle for 4 minutes, we get the "End Point read Failed". If the idle time is less than 4 min, its not an issue.
In my case, I am trying to access Spanner from my GCE instance using gRPC. I just upgraded the gRPC to latest version available as on today Dec-29-2017. Here is the sequence I tried.
D1229 18:26:44.677266881 5399 tcp_posix.c:184] read: error={"created":"@1514597204.677255710","description":"Socket closed","fd":19,"file":"src/core/lib/iomgr/tcp_posix.c","file_line":250,"target_address":"ipv4:74.125.201.95:443"}
Issued another request at 18:27:38.348023815. It is successful at this time.
Waited 3 minutes, and tried another request at 18:30:50.006342607. It is successful as well.
Waited 4 minutes and tried at 18:34:59.179743317. This time, I got the "Endpoint read failed" error.
Tried immediately another request at 18:35:11.271402277, this time it is successful.
Then the request at 18:39:31.524507462. That is after 4 min idle, I got the same error.
Although I didn't try many times, whatever I tested so far suggests that somewhere there is a 4 min idle timeout. Not sure, if it is at the spanner server side or at the gRPC side.
Here is the full trace
D1229 18:34:59.179743317 5399 tcp_posix.c:457] WRITE 0x1c8bdf0 (peer=ipv4:74.125.124.95:443): 17 03 03 06 1c 00 00 00 00 00 00 00 07 b1 db f1 25 8f d8 88 10 b3 60 97 e7 ad 30 63 f2 cb 7f f1 b2 a8 d3 03 d9 71 87 87 04 39 e4 50 09 b1 4e 4f 82 b8 96 ea 4e 97 5e a4 f5 34 b6 f6 2b f6 8b fb 87 95 6a 79 a2 47 18 7d 4e 70 5b 42 72 37 0b c5 f3 94 13 c0 3a 7b d7 0f 2c 03 81 ef dd a6 90 99 e1 e5 3a 46 3e 4c 0e 2c 01 c0 33 d2 6a 68 c3 a2 a3 80 01 1d 48 82 b6 34 d0 16 5f b6 29 ec 53 39 ee 64 8b 0e 2c e8 84 bc 74 17 75 d5 e0 ff ee cf 58 47 b0 13 fa a6 72 3b 16 3b 2f b7 84 84 96 f0 16 f9 24 00 f9 43 4b 56 a6 9a 60 31 3e b8 90 f2 39 65 07 42 cc 0f 19 a3 f5 3f e3 bd e5 a8 7a ed 6b 07 7f 55 df 18 9a 60 c8 40 0c df fb ee 24 1a df 8b 35 2f 83 37 fc ec b2 14 f7 90 6e 9f 6d ee 95 f9 fd e2 31 16 d4 01 e2 ab 68 79 ab 79 97 4e 96 d8 e3 a7 f8 55 84 1f 16 b8 f9 4d 51 b3 11 42 74 db 32 c8 78 30 88 c5 f2 c7 78 41 88 2e 96 14 a9 a4 d2 81 ba 32 67 60 8d 85 87 45 27 22 d2 41 01 3e 1a 8e 85 54 bd 69 c5 e7 78 3c 28 77 3b 24 b7 54 a6 18 25 93 c1 57 57 84 a8 16 8d 59 5c 4f dc 76 9a 6a 47 c5 42 4c a1 ee 07 06 d8 88 1e de 98 51 4b 7b d8 a2 12 f0 03 95 a6 61 9f a8 70 86 cc 5b 35 4e 5e 25 22 79 4e 4a 35 49 3e 9c 6e 90 12 4a 3a d4 8a 8d 03 8a a8 61 bb 56 b9 ab ac 15 6b be 8b 32 9e dd a3 48 62 4e 38 4f e5 58 98 e1 1b 60 89 db e0 8e 69 0b 0a 7d 53 28 72 db 40 2e 6d 4d 31 d3 fe 8a 69 95 7f 1f 37 7e b8 61 cc 47 70 aa 0c 66 4b fe 26 ad d1 8e 1b 96 e3 93 aa 82 6b db ca 11 79 ba c8 0e 95 7f 5d 52 56 90 8b f8 5f 7f 27 9c 7f d8 e2 8d d9 9b d3 9b 6c d1 44 bf 8c c7 b0 e8 f2 8f 45 ad 9f 3b 5e 07 42 2d ea 5f 16 92 76 a7 28 6c dc f8 c8 db 2b d0 fe 6f 47 ae 1e cc 9b 5f fd 6b 38 75 6d d5 01 48 4f d8 5a 38 92 5c 29 1f be 78 87 d0 86 3a 1d 49 0d ff 86 b0 ba ff 9d 2a f5 d3 2f 51 8a f3 2f 8d 4f 25 dd 78 52 a5 91 02 a6 d3 b9 02 2a cb cb 2b ff 2c b0 0b 14 02 37 f3 f5 30 a0 74 19 8e 63 bc 83 e2 63 e8 a9 c2 e5 ef 37 ac a7 e7 33 ba b2 14 88 b2 6a 6f fa 8c 64 58 75 0f e8 6b b7 ed 6a 9e aa 53 64 d4 35 f8 b0 c2 60 0f 70 19 9d 2f 23 10 1b 8e ce f0 55 7b 65 19 b3 8d c2 fe f5 d3 b6 71 35 d8 84 30 05 36 e9 32 de 04 68 58 f2 07 b4 f4 52 92 91 8f 68 1f fe 28 ea 08 55 df 59 d9 88 80 ed fc 32 18 69 3c 59 52 69 6a b1 c2 db ee 69 ad 3c 85 cb 9a cc c5 c2 62 99 92 97 f2 65 74 31 7e 74 0a cb fb ca a0 ed a6 26 61 69 85 b9 3b 36 eb 8b 53 e6 b7 1c 59 16 fd 0a 26 a0 89 5f c0 73 2b ad b4 0b 25 42 f5 be 66 d0 97 80 c8 23 85 c9 09 0e 20 54 f7 44 b3 f8 a5 db 7f 22 91 d1 f9 fb 0f a9 31 d4 8b 47 89 b0 5e 71 2a df 33 62 31 46 5c 2c 99 f0 49 ca dc 3d 41 33 b1 b0 7d db 62 8e 07 28 01 7b e8 39 78 30 1f e3 9c 5b d0 d2 a2 60 60 66 aa 2c 87 9c 7c 16 86 47 6e 64 03 11 4c ab db f8 ab 69 62 f8 79 f6 a7 ab 2d 25 c6 fd 35 ea 9c f5 2c 3e e4 8b 68 37 71 5a 77 7f 10 0d 4f b2 f1 c9 84 b2 0f 58 cf fd fe 1c 68 ed 5e b6 7c 57 10 07 32 8a 76 c0 1c 6b 9f 41 0e 8b b4 58 a8 97 31 dd 35 0c bf 25 3e ac b0 34 77 77 8b 64 53 76 ba 92 ca eb b3 b0 a9 30 3c 77 12 7e 3f 9f 99 35 ce 82 ca 0c 78 3f 37 b7 25 e8 39 e5 a1 e0 86 4d 8e 90 fb c0 ab 8e a2 2d e6 bd 6a 6a 8a 4c 8a 31 7b da 5f 55 0d 0f 04 e4 c7 ab 30 33 33 f7 cf 95 00 c9 58 6f 69 5b 06 08 b4 e0 4b 3b cb 90 b1 4d 58 c5 2f fb 90 1f b4 4f c1 b3 78 4f c6 90 21 45 12 d9 57 ec 5e ee 06 0e 3e 21 29 65 5a 5f ee 25 27 bf 87 f7 33 39 ef 1d 8e 37 f3 55 e8 2f 33 67 15 7b d7 c4 34 7e 63 78 38 e5 d8 ce 9c dc d0 02 93 65 ef 19 21 91 8b 67 67 ba bb 97 68 a7 b4 c8 41 cd 32 cc 86 33 27 09 92 b4 6f 0c ba 9d 0a e8 52 1f 2d 5e 17 64 5e 40 07 f9 2c 1e 3a 1b 1a 00 a1 97 01 30 f4 90 82 90 95 26 12 b4 e5 51 10 2b 39 e1 9a 1d 00 7a 12 f3 f9 3b d5 fc 28 32 79 67 8e 43 af 71 8b 10 f5 5b 22 4c d7 d9 71 8d e8 ba 43 1d db 66 9a ff 0e 43 67 fd c4 ce cc ba f9 00 05 e5 f1 2e bb f2 37 0e 6d 26 72 92 f0 da b5 25 8d 85 bb 7f cb e1 ff 86 c4 f2 9d 10 90 aa 10 c1 62 82 4b 87 83 e2 d1 f6 f0 cd ee ef 84 48 1c c9 46 26 3c f5 e3 ab c3 64 98 69 2a 8f 22 29 c9 bd f7 16 9c 1c 4e 61 ad de 1d 74 d7 84 3d ef fb b8 3c 2e b1 56 1d a5 1d 88 73 2a 0f 73 c3 d7 e2 71 b3 36 f3 73 4d b7 82 05 78 fa 52 e6 1b 88 a8 ea 9a e6 0e 2f fa 00 a9 4d 7e be 63 b7 50 1e 6f 1d 16 88 cf c5 81 d7 c1 ff 8e 0c 1e 60 10 f3 1e 91 f9 3f 5e e0 2c fb 2b b4 ae c9 60 4e d7 31 bd 96 5e 3e a4 e6 ca 8e 65 e3 9a 36 0c 3c ec 7e 5d e5 71 74 05 5d d2 00 4e b8 33 b2 83 63 aa bb 42 f8 9b 1b 5c d0 35 b3 9c b9 36 7c f1 63 5f e1 3f 9a 4f 24 80 8b 9e b8 83 23 06 f0 91 74 1f 27 81 47 5a 7c b3 d2 0b 9f 4b 4c 01 4b b5 ca dc 0c 02 b7 48 69 e4 61 23 89 2e ad 83 8e 09 c6 57 a3 86 9e 94 ba a0 e4 70 4b e1 20 9a 1b bf 71 e7 a5 02 4f 30 88 1c 4c ec 46 bb 24 0a 48 5c 3d 9d 5a a8 d1 cb ff 9e b3 4f e3 17 20 b4 cc 29 39 53 ca fe f9 02 5e 27 da 40 8e 26 27 c8 17 4e 45 54 09 41 8b eb 9f ea bb 8b da b6 21 65 6f bb 2c 75 9e 65 fe 89 35 81 52 80 06 f5 15 f2 bd 24 cd 0f 1b 83 e6 34 7c 44 f9 e4 b6 9c ca af 0e 0f 48 f5 47 ab 28 d0 26 af 4d 1b ab f1 df 0f 10 3c 6d 24 d1 07 9c 56 cb 3f b2 1f 18 '................%........0c.........q...9.P..NO....N.^..4..+.....jy.G.}Np[Br7......:{..,.........:F>L.,..3.jh......H..4.._.).S9.d..,...t.u.....XG....r;.;/.......$..CKV..1>...9e.B.....?....z.k..U....@....$...5/.7......n.m.....1.....hy.y.N.....U.....MQ..Bt.2.x0....xA.........2g...E'".A.>...T.i..x<(w;$.T..%..WW....Y\O.v.jG.BL.........QK{.......a..p..[5N^%"yNJ5I>.n..J:......a.V....k..2...HbN8O.X.......i..}S([email protected]~.a.Gp..fK.&.........k...y.....]RV..._.'.........l.D.......E..;^.B-._..v.(l....+..oG...._.k8um..HO.Z8.\)..x...:.I.......*../Q../.O%.xR.......*..+.,....7..0.t..c...c.....7...3.....jo..dXu..k..j..Sd.5....p../#.....U{e........q5..0.6.2..hX....R...h..(..U.Y.....2.i
D1229 18:34:59.179864081 5399 tcp_posix.c:184] read: error="No Error"
D1229 18:34:59.179873202 5399 tcp_posix.c:189] READ 0x1c8bdf0 (peer=ipv4:74.125.124.95:443): 17 03 03 00 21 00 00 00 00 00 00 00 0d c7 3f 08 02 79 ca 87 a1 dd 16 e3 74 14 c2 65 e4 bb 23 41 e7 c4 2a c0 91 d8 17 03 03 00 3a 00 00 00 00 00 00 00 0e 0a ed 7b 68 c2 36 9e e6 d7 37 36 0b 7b f1 2c d9 bc 0c 03 13 3e dc 70 c6 bf 48 ad 68 40 e1 f6 f3 e8 66 df 19 da 08 b2 f1 b5 ef c6 34 30 1e 77 91 41 5c '....!.........?..y......t..e..#A..*.......:..........{h.6...76.{.,.....>.p..H.[email protected]\'
D1229 18:34:59.179915521 5399 tcp_posix.c:184] read: error={"created":"@1514597699.179906456","description":"Socket closed","fd":19,"file":"src/core/lib/iomgr/tcp_posix.c","file_line":250,"target_address":"ipv4:74.125.124.95:443"}
D1229 18:34:59.179944839 5399 tcp_posix.c:457] WRITE 0x1c8bdf0 (peer=ipv4:74.125.124.95:443): 17 03 03 00 27 00 00 00 00 00 00 00 08 f4 c2 6c d2 54 87 a6 4d 47 65 db 0c 4f a8 58 41 b5 05 25 2c db 85 02 5e 56 61 a0 73 08 61 eb '....'..........l.T..MGe..O.XA..%,...^Va.s.a.'
D1229 18:34:59.179964997 5399 tcp_posix.c:487] write: "No Error"
Error {"is_success":false,"success_value":null,"error_obj":{"error_code":20000,"error_category":1,"error_params":["ExecuteSql","Endpoint read failed","{\"sql\":\"\n SELECT shard_id,\n xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx@zip \",\"query_mode\":0,\"session\":\"projects/xxxxxx/instances/my-instance/databases/my-database/sessions/ABuzUfGrOsZsrZJl9yGzbUwy4TXQPkjgfvJUCsValBnv2FzFdMsiDic7Zm4O\",\"params\":{\"strNumber\":\"3411\",\"strName\":\"16TH ST NW\",\"city\":\"WASHINGTON\",\"state\":\"DC\",\"zip\":\"20010\"},\"param_types\":{}}"]}}
Thanks for that information. @murgatroid99 -- what seems like the correct solution, given this pattern described above? Is there a setting we need to change when we instantiate the gRPC client? Or is this something the GCF team needs to handle from their end? Thanks!
Is this comment relevant from [email protected]
https://groups.google.com/forum/#!topic/cloud-functions-beta-testers/TR3G99q2mIs
"It appears that Cloud Functions can keep themselves warm to some extent. I use Spanner and do:
const database = instance.database(databaseId, {keepAlive : 0.1}); // keepAlive in minutes
This causes the Spanner client library to ping the Spanner server every 6 seconds; without this, the Spanner server will drop the connection after a timeout, and then the next command may fail instead of automatically reconnect. The end-user code should be robust enough to retry failed commands, but it seems like not many people bother. Keeping these background connections open is important with Spanner, because just establishing the server connection takes ~3 seconds, so you couldn't really do that for each Cloud Function invocation.
When a "warm" Cloud Function instantiation is not handling a request, it does still have the ability to use a tiny bit of CPU.
I don't use Pub/Sub, but I wonder if your initial issue arose because the Pub/Sub connection was broken, but this didn't get reported because of this CPU throttling, and the "connection lag" got interpreted as "no outstanding messages"."
Also having this issue with FireStore and PHP client. The first request after ~5 minutes returns an error, the second goes fine.
https://github.com/googlecloudplatform/google-cloud-php#cloud-firestore-beta
$store = new FirestoreClient([
'keyFile' => ['permissions']
]);
$document = $store->document('records/1');
{
"message": "OS Error",
"code": 14,
"status": "UNAVAILABLE",
"details": []
}
Surprisingly, this method works fine.
$collection = $store->collection('records');
Ping @murgatroid99 -- can you check out https://github.com/GoogleCloudPlatform/google-cloud-node/issues/2438#issuecomment-354795595?
I would recommend testing this again with grpc 1.8.4. We changed the status codes of some errors to "UNAVAILABLE", which the Google Cloud libraries retries, and I think this might be one of those errors.
This error also occurs using the '@google-cloud/datastore' api.
Should I open another issue?
For now, we are downgraded to [email protected] due to another bug: https://github.com/googleapis/gax-nodejs/pull/178
We've just released these modules which include the upgrade to gRPC 1.9. This should correct all problems reported in this issue.
@google-cloud/datastore v1.3.5
@google-cloud/logging v1.1.7
@google-cloud/pubsub v1.6.5
@google-cloud/spanner v1.2.0
Was seeing this issue using @google-cloud/monitoring v0.4.1 to write some metrics via a cloud function. Tried upgrading to the not yet released 0.5.0 that upgrades to grpc v1.9.1 and I'm seeing a similar but not identical error message.
Error: 14 UNAVAILABLE: TCP Read failed
at createStatusError (/user_code/node_modules/@google-cloud/monitoring/node_modules/grpc/src/client.js:64)
at (/user_code/node_modules/@google-cloud/monitoring/node_modules/grpc/src/client.js:583)
Is this a different manifestation of the same issue or should I open a new ticket?
That is a closely related error. As mentioned, it is now getting reported with the UNAVAILABLE status code. I thought the Google Cloud library would retry requests that fail with that status code.
I'm seeing
{ Error: 14 UNAVAILABLE: TCP Read failed at new createStatusError (/user_code/node_modules/@google-cloud/datastore/node_modules/grpc/src/client.js:64:15) at /user_code/node_modules/@google-cloud/datastore/node_modules/grpc/src/client.js:583:15 code: 14, metadata: Metadata { _internal_repr: {} }, details: 'TCP Read failed' }
When trying to save() on datastore via GCF.
@stephenplusplus Sadly, I'm still seeing this issue. As many people mentioned it, it only happens on the first request. I'm using @google-cloud/spanner v1.4.0
The full stack trace is:
```
A 0|app | Error: Cannot find module '/app/node_modules/@google-cloud/spanner/node_modules/@google-cloud/common-grpc/node_modules/grpc/src/node/src/client.js'
A 0|app | at Function.Module._resolveFilename (module.js:542:15)
A 0|app | at Module._load (module.js:472:25)
A 0|app | at loadAndPatch (/app/node_modules/@google-cloud/trace-agent/build/src/trace-plugin-loader.js:141:36)
A 0|app | at Function.Module_load [as _load] (/app/node_modules/@google-cloud/trace-agent/build/src/trace-plugin-loader.js:173:37)
A 0|app | at Module.require (module.js:585:17)
A 0|app | at require (internal/module.js:11:18)
A 0|app | at Object.
A 0|app | at Module._compile (module.js:641:30)
A 0|app | at Object.Module._extensions..js (module.js:652:10)
A 0|app | at Module.load (module.js:560:32)
```
@WaldoJeffers based on the error mesage that seems an unrelated error. Can you open a new issue for this?
Same Issue with logging-bunyan package ran from Firebase Functions.
try-catch block doesn't appear to catch these either, so if it errors, the function quits.
Also seeing an issue using logging-bunyan within Firebase Functions
Error: 14 UNAVAILABLE: TCP Read failed
at Object.exports.createStatusError (/user_code/node_modules/@google-cloud/logging-bunyan/node_modules/grpc/src/common.js:87:15)
at Object.onReceiveStatus (/user_code/node_modules/@google-cloud/logging-bunyan/node_modules/grpc/src/client_interceptors.js:1188:28)
at InterceptingListener._callNext (/user_code/node_modules/@google-cloud/logging-bunyan/node_modules/grpc/src/client_interceptors.js:564:42)
at InterceptingListener.onReceiveStatus (/user_code/node_modules/@google-cloud/logging-bunyan/node_modules/grpc/src/client_interceptors.js:614:8)
at callback (/user_code/node_modules/@google-cloud/logging-bunyan/node_modules/grpc/src/client_interceptors.js:841:24)
Hello. I'm using @google-cloud/datastore (v1.4.1) to write entities through a stream.
I'm having no issue running the code locally but I get an error when I run the on a cloud function.
Here's my log trace :
{
insertId: "000000-080c5482-2dd4-4ee6-9227-3f15ae868137"
labels: {…}
logName: "xxx"
receiveTimestamp: "2018-08-16T16:42:27.680546524Z"
resource: {…}
severity: "INFO"
textPayload: "{ Error: 14 UNAVAILABLE: TCP Read failed
at Object.exports.createStatusError (/srv/node_modules/grpc/src/common.js:87:15)
at Object.onReceiveStatus (/srv/node_modules/grpc/src/client_interceptors.js:1188:28)
at InterceptingListener._callNext (/srv/node_modules/grpc/src/client_interceptors.js:564:42)
at InterceptingListener.onReceiveStatus (/srv/node_modules/grpc/src/client_interceptors.js:614:8)
at callback (/srv/node_modules/grpc/src/client_interceptors.js:841:24)
code: 14,
metadata: Metadata { _internal_repr: {} },
details: 'TCP Read failed' }"
timestamp: "2018-08-16T16:42:20.211Z"
}
The only difference I can see between the two different environments setup is that I'm passing a credential object required by Google libs to connect to Google Service (Eg. projectId, keyFilename) when I run locally. Nothing is passed on cloud function environment as it's supposed to be implicit Auth across GCP services. (This has been verified with using @google-cloud/storage lib both local and function)
Well, oh bother. I can reliably reproduce this. I have an app engine standard nodejs app that seems to always toss a 500 on my first request after using it for the first time in a while. The output in cloud logging looks like this:
A { Error: 14 UNAVAILABLE: TCP Read failed
A at Object.exports.createStatusError (/srv/node_modules/grpc/src/common.js:91:15)
A at Object.onReceiveStatus (/srv/node_modules/grpc/src/client_interceptors.js:1204:28)
A at InterceptingListener._callNext (/srv/node_modules/grpc/src/client_interceptors.js:568:42)
A at InterceptingListener.onReceiveStatus (/srv/node_modules/grpc/src/client_interceptors.js:618:8)
A at callback (/srv/node_modules/grpc/src/client_interceptors.js:845:24)
A code: 14,
A metadata: Metadata { _internal_repr: {} },
A details: 'TCP Read failed' }
This is happening when trying to do a simple read from firestore:
const db = new Firestore();
const jobs = db.collection('jobs');
await jobs.doc(job.id).set(job)
After doing a refresh and making the second call, all appears to be working fine. This does not happen locally, or in a docker container.
@stephenplusplus @alexander-fenster @murgatroid99 @ofrobots
I can give folks access to my project if this is useful. I am going to work on an independent repo. FWIW, the app is:
https://npmtrace.appspot.com
You have to search for a package name/version that hasn't been indexed. You can find the sources here:
https://github.com/JustinBeckwith/npmtrace
Smells a little like an internal networking issue (TCP protocol support surface), which may be unrelated to GRPC or client libraries specifically. @JustinBeckwith perhaps create an internal bug and reference this issue for context?
Same error here using @google-cloud/logging: 4.1.1 on GAE NodeJS8 Standard Environment.
Error: 14 UNAVAILABLE: TCP Read failed
at Object.exports.createStatusError (/srv/node_modules/grpc/src/common.js:87:15)
at Object.onReceiveStatus (/srv/node_modules/grpc/src/client_interceptors.js:1188:28)
at InterceptingListener._callNext (/srv/node_modules/grpc/src/client_interceptors.js:564:42)
at InterceptingListener.onReceiveStatus (/srv/node_modules/grpc/src/client_interceptors.js:614:8)
at callback (/srv/node_modules/grpc/src/client_interceptors.js:841:24)
First errors detected in three different projects in the first week of December.
I've got the same error message in Firebase Functions's log:
{ Error: 14 UNAVAILABLE: TCP Read failed
at Object.exports.createStatusError (/user_code/node_modules/firebase-admin/node_modules/grpc/src/common.js:87:15)
at Object.onReceiveStatus (/user_code/node_modules/firebase-admin/node_modules/grpc/src/client_interceptors.js:1188:28)
at InterceptingListener._callNext (/user_code/node_modules/firebase-admin/node_modules/grpc/src/client_interceptors.js:564:42)
at InterceptingListener.onReceiveStatus (/user_code/node_modules/firebase-admin/node_modules/grpc/src/client_interceptors.js:614:8)
at callback (/user_code/node_modules/firebase-admin/node_modules/grpc/src/client_interceptors.js:841:24)
code: 14,
metadata: Metadata { _internal_repr: {} },
details: 'TCP Read failed' }
I am getting this error as well, GAE standard node 10 environment. I am using @google-cloud/logging-winston which is is using @google-cloud/[email protected].
Error: 14 UNAVAILABLE: TCP Read failed
at Object.exports.createStatusError (/srv/node_modules/grpc/src/common.js:87:15)
at Object.onReceiveStatus (/srv/node_modules/grpc/src/client_interceptors.js:1188:28)
at InterceptingListener._callNext (/srv/node_modules/grpc/src/client_interceptors.js:564:42)
at InterceptingListener.onReceiveStatus (/srv/node_modules/grpc/src/client_interceptors.js:614:8)
at callback (/srv/node_modules/grpc/src/client_interceptors.js:841:24)
I'm also facing the same issue where after some idle time, grpc throws UNAVAILABLE error.
is there a way to work around this problem as I didn't find any solution on this thread.
@JustinBeckwith RE: https://github.com/googleapis/google-cloud-node/issues/2438#issuecomment-450755179 -- have you filed an issue or found anything?
Yeah, there's an internal bug on cloud functions here. They're doing some things to increase a timeout, but this issue absolutely still exists.
@JustinBeckwith does that also apply to GAE NodeJS8 Standard Environment as well?
Possibly. Are you seeing a similar callstack in an app engine standard app? cc @steren
I can't remember/find the stack trace I'm afraid (I starred this issue after this comment
https://github.com/googleapis/google-cloud-node/issues/2438#issuecomment-454587844) :(
I originally came here because of googleapis/nodejs-logging-winston#190, and just reverted back to a previous version of @google-cloud/logging-winston.
@JustinBeckwith fwiw I am having this issue in GAE Standard with node 10. Do you know if there is an open issue within google/or their issue tracker? I am using the nodejs-logging-winston and see the TCP read failed and at the same time my redis connection to a GCE instance also has problems and gets a ECONNRESET.
- 2019-03-06 14:38:04.758 CST [ioredis] Error: read ECONNRESET ... at TCP.onStreamRead (internal/stream_base_commons.js:111:27)
- 2019-03-06 14:43:27.776 CST Winston encountered an error - Error: 14 UNAVAILABLE: TCP Read failed
Is it an issue in how internal communication is handled?
Not sure if this is unrelated, since the original issue was related to logging
Recently, we've been getting these more in our Node 8 cloud functions. We've isolated it down to a part of the code that writes to Firestore.
It may be coincidental, but there seems to be a pattern. We tend to get this error in batches (i.e., many of them around the same time and then it stops). It also tends to happen around the time our app's load spikes. I have a hypothesis that during traffic bursts there may be some issue setting up resources as instances our scaled? Just a wild guess...
Error: 14 UNAVAILABLE: TCP Read failed
at Object.exports.createStatusError (common.js:87)
at Object.onReceiveStatus (/srv/node_modules/grpc/src/client_interceptors.js:1188)
at InterceptingListener._callNext (/srv/node_modules/grpc/src/client_interceptors.js:564)
at InterceptingListener.onReceiveStatus (/srv/node_modules/grpc/src/client_interceptors.js:614)
at callback (/srv/node_modules/grpc/src/client_interceptors.js:841)
Greetings folks! I _think_ we've resolved this issue. With the latest and greatest version of all libraries, you should now be getting a dependency on @grpc/grpc-js instead of grpc. This module is rewritten from the ground up, and uses the native HTTP/2 support in node core. We think the combination of this new module, and some fixes to timeouts on the Cloud Functions/App Engine backend should resolve the issue.
If you are running into this still ... please let us know! Just make sure you're using the latest version of the module, and that you post a stack trace (as above).
Edit: I was wrong
Any work arounds for this issue?
Not currently, but we're making progress on a fix
Hi folks,
We made some changes in @grpc/grpc-js https://github.com/grpc/grpc-node/pull/1021 that might fix the problem described here. They are releases as v0.5.3. Can I bother you updating, checking if the issue is still here with v0.5.3, and sending the stack trace back if something still does not work?
Thanks!
@deremer :wave: hey, I just wanted to check in and see if @alexander-fenster's changes in @grpc/grpc-js helped?
@deremer 👋 hey, I just wanted to check in and see if @alexander-fenster's changes in
@grpc/grpc-jshelped?
We haven’t had a release to deploy with updated dependencies yet. I will report back once we do so and get some data. Lots of other folks on here too so they might see results sooner.
@deremer @juanparadox @kyle-mccarthy, can you folks confirm that the latest release of @google-cloud/logging has addressed this issue for you?
@deremer I'm going to go ahead and close this issue, because it's been almost a month, and seems like things have been working okay for folks.
Please feel free to re-open if you are continuing to bump into these issues.
This error popped up this morning for the first time.
{ Error: 14 UNAVAILABLE: TCP Read failed
at Object.exports.createStatusError (/srv/node_modules/grpc/src/common.js:91:15)
at Object.onReceiveStatus (/srv/node_modules/grpc/src/client_interceptors.js:1204:28)
at InterceptingListener._callNext (/srv/node_modules/grpc/src/client_interceptors.js:568:42)
at InterceptingListener.onReceiveStatus (/srv/node_modules/grpc/src/client_interceptors.js:618:8)
at callback (/srv/node_modules/grpc/src/client_interceptors.js:845:24)
code: 14,
metadata: Metadata { _internal_repr: {} },
details: 'TCP Read failed' }
Greetings! Could you open a new issue with details?
Most helpful comment
I'm seeing
When trying to save() on datastore via GCF.