Rubberduck: External type definitions for VBA

Created on 6 Oct 2019  路  16Comments  路  Source: rubberduck-vba/Rubberduck

Justification
Consider the following code (using the Excel object model):

Dim col As Sheets
Set col = Application.Worksheets.Item(1)

This code will fail, because col is a Sheets collection, while the object returned from Worksheets.Item(1) is a Worksheet.

The VBA compiler cannot currently catch this sort of thing, because Application.Worksheets returns a reference to the Sheets collection, and Sheets.Item can return either of two types:

  • if you pass in an array of numbers or strings, you get back a Sheets object, containing the corresponding Worksheet objects
  • if you pass in a single string or number, you get back a single Worksheet

and the Excel object libraries can only represent a single type here, so the type library specifies Object as the return type.

Description
It should be possible to define somewhere (include as a module? stored in a GH repo that RD can download from?) something like this (using VB.NET syntax, and a VB.NET proposed syntax for composite types):

Namespace Excel
    Class Sheets(Of TSheet)
        Default Property Item(Integer Or String) As TSheet
        Default Property Item(Array(Of Integer Or String)) As Sheets(Of TSheet)
    End Class

    Class Worksheet
    End Class

    Class Application
        Property Worksheets As Sheets(Of Worksheet)
    End Class
End Namespace

Then RD could catch that Worksheets.Item(1) returns a Worksheet, and cannot be assigned to a variable of type Sheets.

Additional context
From my experience in mapping the Office object libraries to Typescript definitions, if these type definitions would allow defining multiple overloads even for properties, that would be sufficient. However, adding the following type features would make writing and reading these definitions more convenient:

  • Union types
  • Generics

How should the definition be related to the library currently referenced in the VBA project?

Perhaps using the GUID. The definition would apply to multiple GUIDs, although this means that as overloads are added and removed between versions, there should be some way to indicate that a particular overload applies to a particular version or range of versions.

enhancement

Most helpful comment

Chained calls isn't the problem per se. The problem is when the chain includes variant/object which effectively renders the code unverifiable at the compile-time.

IMO, an unverifiable code is a bug time bomb and yes I do encourage creating a strongly typed variable.

A common pattern I see in Access is to do stuff like:

Me.MySubForm.Form.SomeControl.Value = "foo"

It's verifiable up to the point of Form, which returns an Access.Form which is generic and has no specific information about a control named SomeControl that exists on the particular form. Thus, I encourage people to prefer this syntax:

Dim MySub As Form_MySub
Set MySub = Me.MySubForm.Form
MySub.SomeControl.Value = "foo"

Note that we still have chained access but it's now 100% verifiable. That is the main thing that I care about.

All 16 comments

Sheets is the return type of the Worksheets property; Sheets is also another property that also yields a Sheets reference, only that collection might include sheets that Worksheets wouldn't (e.g. Chart sheets).

I like this idea... basically document the "overloads" in various type libraries.. I think we could make that work with an xml format; this one would have an overload definition taking a string, another taking a long, and another taking an array (any array I guess).

That's the beauty of allowing union types: you have two overloads one taking a String Or Integer which returns a single Sheet and another taking an Array(Of String Or Integer) which returns a Sheets collection.

Adding generics into the mix means we could define Worksheets as returning a Sheets(Of Worksheet), which would mean that any individual item in the collection would be a Worksheet (updated the definition in the original comment).

Something of note here: large paths of the type library will use Variant for that purpose. Getting proper union types for all type library members implies a lot of work. While I can understand that supplementing Rubberduck's resolver with type information for this, it should be noted that this could be significant amounts of manual work.

In addition loading external typelib definitions would further slow down the resolver. What you're proposing results in really cool capabilities, but needs a ton of work put in up front.

@Vogel612 It doesn't need to be upfront. This is stuff that could be added gradually, and part of compiling RD into an installer would be to get the latest version of these definitions.

I should also add that I've done most of this work for the Excel, Access and (a large part of the) Word libraries (among others), for the Typescript definitions at DefinitelyTyped (for example, Application.Worksheets and the Sheets collection). I think I would be able to compare the current version against the auto-generated definitions, and port the differences into some XML format, or other VBA-consumable format.

The mental model I have here (if it's not yet obvious) is how Typescript leverages type definitions in external files even when the actual source is not in Typescript.

@retailcoder I'm not sure how much I like the idea of an XML format. It might be a pain to write. Having a VB-style format would be easier.

OTOH, having an XML format would make it possible to consume from other languages that need this information, like Typescript or Python, or even .NET languages, where the union types could be represented as multiple overloads.

The advantage of an XML format is that we might be able to piggyback of enterprise-grade type information interchange formats like XMI, which also have existing toolchains that could enable users to supplement shipped external type definitions with their own external type definitions.

Another option would be to fall back to ABI for interchange and loading. We already have some code dealing with ABI, if memory serves, so that could be adapted...

I like the idea, but it will be a major effort to incorporate this into the resolver.

First of all, I think we should not replace any information we currently hold, because that would certainly break half of RD. After all RD is built on the assumption that we deal with VBA/VB6, which neither supports union data types nor overloads. I think it you rather be handled as MaybeAsType and as MaybeReferences collected on the side and usable by inspections.

Dealing with multiple different types is doable in the resolver, but will basically require enhancing every single binding we currently use, because we have to take all the possible resolution paths into accouont. My main headache around this is how to prune the maybe references based on the information from deeper down in the binding path. E.g. currently the reolution of the accessed part of a member access is final when we come to the binding of the member, but with multiple alternatives, it is not. An example of this is the following expression.

ActiveWorkbook.Sheets(1).ChartArea

Here the type of ActiveWorkbook.Sheets(1) is Object, entirely unabiguously even before we look at ChartArea, which will be of compile time type Variant since it is an unbound member call. Now, using some maybe types, I would have both Worksheet and Chart. Only after looking at the member, we know that it can only be Chart. This will add quite another level of complexity to the resolver.

@MDoerner

we should not replace any information we currently hold

Information generated from the referenced type libraries in the current project? Or is there static information baked into RD itself?

What I mean is that we should not replace the AsType on the declarations we create from the typelib and user code. Having more than one type would confuse half of RD because the assumption throughout the code base is that a declaration can only have one type and that something like Union data types do not exist.

@MDoerner

My main headache around this is how to prune the maybe references based on the information from deeper down in the binding path.

My first thought on reading this was "Typescript doesn't have this problem." Then I realized that Typescript forces you to typecast (ActiveWorkbook.Sheets(1) as Chart) in order to get ChartArea. And VBA doesn't have any sort of typecasting beyond assigning to a variable,.

If this functionality would be limited to the types of method arguments:

  • preventing misuse of arguments: Application.Worksheets(Application)
  • determining the (single) return type of the object: Application.Worksheets(1) returns a Sheets(Of Worksheet), so Application.Worksheets(1).ChartArea would be invalid

I think it would still provide value, without the specific complexity you mentioned.

Would it be possible to do in two parts?

Just a thought - the first implementation could stop at the first translation. At that point, we would be able to generate new inspections "AmbiguousTypeResult" for unbound calls with suggestions for creating a strong-typed variables. They would have to do fix it one by one in the chain but at least it would be a first step in the right direction.

My problem is not how to do it in VBA. My problem is that I will have to filter the list of possible resolutions (in our internal data) based on information gathered further down the resolution path. That is something that is simply not necessary right now, because of how VBA works.

@bclothier But if someone is used to writing in chained methods, then forcing creation of new variables is a little uncomfortable. Other languages have more natural typecasting syntax, so it makes more sense.

if someone is used to writing in chained methods, then forcing creation of new variables is a little uncomfortable.

I would say "just too bad". If someone is used to writing chained late-bound method calls, IMO Rubberduck's job would be to tell them "this member call here, is late-bound - you want a variable here".

When that someone gets used to intellisense and compile-time validation for every single one of their member calls, they'll never look back is what I'm willing to place my bet on.

Chained calls isn't the problem per se. The problem is when the chain includes variant/object which effectively renders the code unverifiable at the compile-time.

IMO, an unverifiable code is a bug time bomb and yes I do encourage creating a strongly typed variable.

A common pattern I see in Access is to do stuff like:

Me.MySubForm.Form.SomeControl.Value = "foo"

It's verifiable up to the point of Form, which returns an Access.Form which is generic and has no specific information about a control named SomeControl that exists on the particular form. Thus, I encourage people to prefer this syntax:

Dim MySub As Form_MySub
Set MySub = Me.MySubForm.Form
MySub.SomeControl.Value = "foo"

Note that we still have chained access but it's now 100% verifiable. That is the main thing that I care about.

ref. #4936

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Gener4tor picture Gener4tor  路  3Comments

bclothier picture bclothier  路  3Comments

susnick picture susnick  路  3Comments

eteichm picture eteichm  路  4Comments

ChrisBrackett picture ChrisBrackett  路  3Comments