Azure-sdk-for-js: Update Open Telemetry dependency

Created on 4 Feb 2021  路  10Comments  路  Source: Azure/azure-sdk-for-js

Currently our @opentelemetry/api dependency is at ^0.10.2.
The latest version is 0.16.0
We should update

cc @xirzec

Azure.Core Client

All 10 comments

Should be an easy update, as I believe the surface we touch hasn't changed.

Looks like @azure/quantum-jobs has. "@types/node": "^10.0.0". CC @HarshaNalluru @sarangan12 @vxfield : is types/node 10.x required for quantum-jobs or can we use 8.x?

I've commented on their PR.. https://github.com/Azure/azure-sdk-for-js/pull/13480#discussion_r569873396

Quantum Jobs is a very thin layer, I believe it's an unintentional switch, we can use 8.x.

Should be an easy update, as I believe the surface we touch hasn't changed.

So far it looks like there is no longer a .parent field on Span but it appears that there is a new parameter to startSpan (context) that is used to extract similar information. I'm just winding through the compiler errors so far, haven't yet gone a chance to run this.

@witemple-msft, @HarshaNalluru: Please take a look at PR #13612 with regards to @azure/quantum-jobs.
In there, I set the dependency back to "@types/node": "^8.0.0" instead of "@types/node": "^10.0.0".
Hopefully it address this issue.

Just merged PR #13612 to bring @azure/quantum-jobs dependency to "@types/node": "^8.0.0" instead of "@types/node": "^10.0.0"

Based on some of the work I've been doing I think we're going to need so give people a chance to help. Writing this down here and will create some sub-issues to enlist others.

Getting upgraded to opentelemetry 0.16.0

I've been working on the opentelemetry upgrade and the work to do it is a bit larger than expected.

The big changes that hit us the most are the spanOptions.parent field being removed, which breaks a lot of our code that assumed that it only had to pass spanOptions to do proper context propagation. Another is that the Span status code is no longer of type CanonicalCode - it is now StatusCode which has 3 values (ERROR, OK, and UNSET).

Talking with @sadasant, it seemed wise to give other people that have time an opportunity a chance to
help. As part of doing this I've noticed some bugs as well, so this is a good time to review tracing and options
passing and make sure it works as expected.

I've created PR#13665 to start the conversion work but it is incomplete. I'm still discussing the naming details and other pieces with @xirzec but it should be good enough to start, even if we have to do a few small renames afterwards.

How you can help?

Look at this example of migrated code

Here's an example of how I changed AppConfig. I believe @xirzec is still noodling over the pattern but this can at least serve as an example of the minimum that needs to happen for your spans to be properly reported:

In AppConfig I have code that looks like this around each generated client call:

 // NOTE: `this._trace` here is just an alias for trace, which I'll show below this example.
 return this._trace("addConfigurationSetting", options, async (newOptions) => {
  const originalResponse = await this.client.putKeyValue(configurationSetting.key, {
   ifNoneMatch: "*",
   label: configurationSetting.label,
   entity: configurationSetting,
   ...newOptions
  });

  return transformKeyValueResponse(originalResponse);
 });

Where the trace method is here:

/** @internal */
export const createSpan = createSpanFunction({
  namespace: "Microsoft.AppConfiguration",
  packagePrefix: "Azure.Data.AppConfiguration"
});

export async function trace<ReturnT>(
  operationName: keyof AppConfigurationClient,
  options: OperationOptions,
  fn: (options: OperationOptions, span: Span) => Promise<ReturnT>,
): Promise<ReturnT> {
  const { updatedOptions, span } = createSpan(
    operationName,
    options,
    options.tracingOptions?.context
  );

  try {
    // NOTE: we really do need to await on this function here so we can handle any exceptions thrown and properly
    // close the span.
    return await fn(updatedOptions, span);
  } catch (err) {
    span.setStatus({
      code: StatusCode.ERROR,
      message: err.message
    });
    throw err;
  } finally {
    span.end();
  }
}

This does mean that the trace function shows up in your callstack. An alternative is simply to mimic this flow and inline the try/catch/finally logic into your method directly.

Use the centralized createSpanFunction* methods from core-tracing

A few of us have custom functions that handle propagating state between various options type but the purpose appears to be the same and nobody appears to be trying to do anything innovative. We can centralize this code in core-tracing, which should make future maintenance easier as well.

There are two exported functions to handle creating a span, properly parented, based on a passed in
OperationOptions (createSpanFunctionForOperationOptions) or RequestOptionsBase (createSpanFunctionForRequestOptionsBase). I chose to give these ugly names just to make it clear
which type you thought you were working with, since it's difficult to provide type checking on types that
have no required fields.

NOTE: I'd like to just remove the RequestOptionsBase variant and just have a single createSpanFunction that just properly assumes OperationOptions, as we did before. We can see, after conversion, if we still need to support RequestOptionsBase (hopefully NOT).

Remove unnecessaryOperationOptions conversions

Some of us are still doing the operationOptionsToRequestOptions call (in some cases, erroneously)
but the underlying generated client now takes OperationOptions. You can remove the unneeded conversion code
and pass the operation options directly. Here are some errors I've seen because of the conversion code.

RequestOptionsBase is too permissive

One potentially common mistake (seen a few times) is caused because RequestOptionsBase is defined like this:

   interface RequestOptionsBase {
    // other fields elided...

    // this here is a problem.
    [key: string]: any;
   }

This can mask errors where your the options type for your method doesn't contain fields you intended to.

Example:

function method(options: MethodOptions = {}): Promise<MethodReturn> {
 const requestOptions = operationOptionsToRequestOptionsBase(options);
 const span = createSpan("spanName", requestOptions);

 // !!! requestOptions doesn't really have 'credentials' property but the overly permissive 
 // declaration for RequestOptionsBase allows it, and just types it as 'any'
 const credentials: IssuerCredentials = requestOptions.credentials || {};

If you change over to OperationOptions instead then this is no longer an issue.

Accidentally propagating the wrong type

I've seen a few variants of this as I was doing my conversion:

function clientFunc(options?: OptionsBasedOnOperationOptions) {
 const converted: RequestOptionsBase = operationOptionsToRequestOptionsBase(options);
 innerClientFunc(converted);
}

function innerClientFunc(options?: RequestOptionsBase) {
 // NOTE: `generatedClientFunc` actually takes an OperationOptions but this compiles perfectly
 // even if it's wrong!
 generatedClientFunc(options);
}

function generatedClientFunc(options?: OperationOptions) {
}

So make sure you as you're porting over that you go all the way down to the generated code to see if
you have to propagate RequestOptionsBase.

Test your telemetry to make sure it's sensible

Testing is pretty easy to do locally. This is ad-hoc but it's a good first step.

  1. Install zipkin (simplest path to getting something local to test with). Predictably, I use a Docker container using this one liner:
docker run --rm -d -p 9411:9411 --name zipkin openzipkin/zipkin

After this runs you can browse to localhost:9411 and see the Zipkin UI. If you want to trash the container (to start clean) you can just do:

docker stop zipkin

Which will also automatically delete it (since we specified --rm above). Then just run the docker run command again.

  1. Use this boilerplate for your program so your spans get reported to zipkin.

package.json

{
 // other stuff elided.

 "dependencies": {
   "@opentelemetry/api": "^0.16.0",
   "@opentelemetry/core": "^0.16.0",
   "@opentelemetry/exporter-zipkin": "^0.16.0",
   "@opentelemetry/node": "^0.16.0",
   "@opentelemetry/plugin-express": "^0.13.0",
   "@opentelemetry/plugin-http": "^0.16.0",
   "@opentelemetry/plugin-https": "^0.16.0",
   "@opentelemetry/tracing": "^0.16.0"
 },
 "devDependencies": {
   "ts-node": "^9.1.1",
   "@types/node": "^14.14.25",
   "dotenv": "^8.2.0",
   "typescript": "^4.1.3"
  }
}

index.ts

import * as otapi from "@opentelemetry/api";
import { LogLevel } from "@opentelemetry/core";
import { NodeTracerProvider } from "@opentelemetry/node";
import { SimpleSpanProcessor } from "@opentelemetry/tracing";
import { ZipkinExporter } from "@opentelemetry/exporter-zipkin";
import * as dotenv from "dotenv";
import { AppConfigurationClient } from "@azure/app-configuration";
import { setTracer } from "@azure/core-tracing";

// useful for _you_ to load up your configuration for your client (creds, whatever)
dotenv.config();

const provider = new NodeTracerProvider({
    logLevel: LogLevel.ERROR
});

provider.register();

provider.addSpanProcessor(
    new SimpleSpanProcessor(
        new ZipkinExporter({
         serviceName: "using new telemetry"
        })
    )
);

const tracer: otapi.Tracer = provider.getTracer("basic");
setTracer(tracer);

async function main() {
 const span = tracer.startSpan("my arbitrary outer span", undefined, otapi.context.active());
 const myOwnContext = otapi.setSpan(otapi.context.active(), span);

 const operationOptions: OperationOptions = {
  tracingOptions: {
   spanOptions: {},
   context: myOwnContext
  }
 };

 // example (make sure you pass in those options!)
 await client.setConfigurationSetting(setting, operationOptions);

 span.end();

 provider.shutdown().then(() => {
        console.log(`Telemetry was sent`);
 });
}

main();
  1. Look in zipkin to see if your spans match what you expect.

@xirzec: After talking with @ramya-rao-a we came up with an incremental plan that should allow me to do this upgrade piecemeal. This should allow us to split things up for stack owners that are interested.

I'll update this issue once I get the ball rolling (should be within the next day or so).

Okay, this is now complete with #14208 having been merged in. Closing.

OpenTelemetry is obviously approaching GA, so I'll create another issue (https://github.com/Azure/azure-sdk-for-js/issues/14596) to track that upgrade when it comes in.

Was this page helpful?
0 / 5 - 0 ratings