Google-cloud-java: Performance tests in pub/sub show that the publisher is using 50-100x more CPU than subscriber

Created on 13 Apr 2018  路  6Comments  路  Source: googleapis/google-cloud-java

I've been running performance tests with Pub/Sub trying to find the right combination of threads, memory, buffer sizes, etc.. I'm seeing that the Publisher taking ~500% CPU versus 5-8% for the Subscriber running on the same server, simultaneously handling the same message throughput. Does that surprise anyone? In looking at the running threads I see that a number of them have the following stack trace in the RSA/SSHA256 JWT signing code. I suspect that this is to a large degree the real limitation of the Publisher's performance.

Anyone else seeing this? Any recommendations on how to work around this? Is there something I can do to the transport layer turn off these or reduce their usage? Maybe GPRC configuration tweaks?

Thanks. This is using v0.38 on a Linux box running Java 7 outside of google's cloud. Would running on Google's computer engine instances?

BigInteger.oddModPow(BigInteger.java:2716) 
java.math.BigInteger.modPow(BigInteger.java:2459) 
sun.security.rsa.RSACore.crtCrypt(RSACore.java:183) 
sun.security.rsa.RSACore.rsa(RSACore.java:122) 
sun.security.rsa.RSASignature.engineSign(RSASignature.java:175) 
java.security.Signature$Delegate.engineSign(Signature.java:1207) 
java.security.Signature.sign(Signature.java:579) 
com.google.api.client.util.SecurityUtils.sign(SecurityUtils.java:147) 
com.google.api.client.json.webtoken.JsonWebSignature.signUsingRsaSha256(JsonWebSignature.java:637) 
com.google.auth.oauth2.ServiceAccountJwtAccessCredentials.getJwtAccess(ServiceAccountJwtAccessCredentials.java:300) 
com.google.auth.oauth2.ServiceAccountJwtAccessCredentials.getRequestMetadata(ServiceAccountJwtAccessCredentials.java:267) 
com.google.auth.Credentials.blockingGetToCallback(Credentials.java:103) 
com.google.auth.oauth2.ServiceAccountJwtAccessCredentials.getRequestMetadata(ServiceAccountJwtAccessCredentials.java:251) 
io.grpc.auth.GoogleAuthLibraryCallCredentials.applyRequestMetadata(GoogleAuthLibraryCallCredentials.java:90) 
io.grpc.internal.CallCredentialsApplyingTransportFactory$CallCredentialsApplyingTransport.newStream(CallCredentialsApplyingTransportFactory.java:91) 
io.grpc.internal.ClientCallImpl.start(ClientCallImpl.java:242) 
io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1.start(CensusTracingModule.java:387) 
io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1.start(CensusStatsModule.java:679) 
io.grpc.ForwardingClientCall.start(ForwardingClientCall.java:32) 
com.google.api.gax.grpc.GrpcHeaderInterceptor$1.start(GrpcHeaderInterceptor.java:95) 
io.grpc.stub.ClientCalls.startCall(ClientCalls.java:293) 
io.grpc.stub.ClientCalls.asyncUnaryRequestCall(ClientCalls.java:268) 
io.grpc.stub.ClientCalls.futureUnaryCall(ClientCalls.java:177) 
com.google.pubsub.v1.PublisherGrpc$PublisherFutureStub.publish(PublisherGrpc.java:538) 
com.google.cloud.pubsub.v1.Publisher.publishOutstandingBatch(Publisher.java:333) 
com.google.cloud.pubsub.v1.Publisher.access$000(Publisher.java:90) 
com.google.cloud.pubsub.v1.Publisher$1.run(Publisher.java:255) 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
java.util.concurrent.FutureTask.run(FutureTask.java:266) 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
java.lang.Thread.run(Thread.java:745) 
pubsub performance question

Most helpful comment

I think I found the issue. From the stack trace, you seem to be using JWT for auth. Until recently JWT tokens aren't cached so you were doing expensive auth negotiation every single call to publish RPC. The subscribe side wasn't affected because it uses long-running streaming call.

This was fixed in https://github.com/google/google-auth-library-java/commit/664754ee1208fe17472e41d10aa752851f610e7e on the auth side. We'll upgrade our library in this repo.

All 6 comments

Yes, this is extremely surprising. When we load-tested pubsub client, we usually observe that subscribers take a little more CPU than publisher. Our load tests run in GCE though. Could you share how you're authenticating the client?

I'm not sure what you mean. I'm using something like:

credentialsProvider =
    FixedCredentialsProvider.create(GoogleCredentials.fromStream(jsonFileInputStream))
...
Publisher.Builder publisherBuilder = Publisher.newBuilder(googleTopicName);
publisherBuilder.setCredentialsProvider(credentialsProvider);
publisherBuilder.setRetrySettings(buildRetrySettings());
publisherBuilder.setBatchingSettings(buildBatchingSettings());
// thread settings
publisherBuilder.setExecutorProvider(InstantiatingExecutorProvider.newBuilder()
        .setExecutorThreadCount(configuration.getPublisherThreadCount()).build());
publisher = publisherBuilder.build();

What else do you need me to show? Are there authentication settings that I'm missing or otherwise doing incorrectly @pongad ? Thanks.

Just found this ticket after submitting my own. I am seeing similar results inside google's cloud (GKE) as I pointed out in #3194

My authentication method is similar to yours. Our base docker image is centos 7 if that makes a difference. We saw similar results with 0.39.0-beta. We also have dependencies to

"io.grpc:grpc-netty:1.10.0"
"io.grpc:grpc-protobuf:1.10.0"
"io.grpc:grpc-stub:1.10.0"

in our project. I mention this because the fact that we use gRPC/netty has had an impact before on separate issues.

I think I found the issue. From the stack trace, you seem to be using JWT for auth. Until recently JWT tokens aren't cached so you were doing expensive auth negotiation every single call to publish RPC. The subscribe side wasn't affected because it uses long-running streaming call.

This was fixed in https://github.com/google/google-auth-library-java/commit/664754ee1208fe17472e41d10aa752851f610e7e on the auth side. We'll upgrade our library in this repo.

Wow. Huge win with 0.38.0 for example. I still see the Publisher use more CPU than the Subscriber but the difference now is like ~3x instead of 50-100x. Throughput seems to have increased so that my program is now hitting other limits. Thanks much @pongad.

For those who don't want to wait, it's an easy pom.xml exclusion stanza. Before you ask, the only difference that I can see between 0.9.0 and 0.9.1 is the addition of the JWT token caching.

<dependency>    
    <groupId>com.google.cloud</groupId> 
    <artifactId>google-cloud-pubsub</artifactId>    
    <version>0.##.0-beta</version>  
    <exclusions>    
        <exclusion> 
            <groupId>com.google.auth</groupId>  
            <artifactId>google-auth-library-credentials</artifactId>    
        </exclusion>    
        <exclusion> 
            <groupId>com.google.auth</groupId>  
            <artifactId>google-auth-library-oauth2-http</artifactId>    
        </exclusion>    
    </exclusions>
</dependency>   
<dependency>    
    <groupId>com.google.auth</groupId>  
    <artifactId>google-auth-library-credentials</artifactId>    
    <version>0.9.1</version>    
</dependency>   
<dependency>    
    <groupId>com.google.auth</groupId>  
    <artifactId>google-auth-library-oauth2-http</artifactId>    
    <version>0.9.1</version>    
</dependency>

Thank you both, the improvement is HUGE for us as well with this solution.

Was this page helpful?
0 / 5 - 0 ratings