Justification
Consider the following code (using the Excel object model):
Dim col As Sheets
Set col = Application.Worksheets.Item(1)
This code will fail, because col is a Sheets collection, while the object returned from Worksheets.Item(1) is a Worksheet.
The VBA compiler cannot currently catch this sort of thing, because Application.Worksheets returns a reference to the Sheets collection, and Sheets.Item can return either of two types:
Sheets object, containing the corresponding Worksheet objectsWorksheetand the Excel object libraries can only represent a single type here, so the type library specifies Object as the return type.
Description
It should be possible to define somewhere (include as a module? stored in a GH repo that RD can download from?) something like this (using VB.NET syntax, and a VB.NET proposed syntax for composite types):
Namespace Excel
Class Sheets(Of TSheet)
Default Property Item(Integer Or String) As TSheet
Default Property Item(Array(Of Integer Or String)) As Sheets(Of TSheet)
End Class
Class Worksheet
End Class
Class Application
Property Worksheets As Sheets(Of Worksheet)
End Class
End Namespace
Then RD could catch that Worksheets.Item(1) returns a Worksheet, and cannot be assigned to a variable of type Sheets.
Additional context
From my experience in mapping the Office object libraries to Typescript definitions, if these type definitions would allow defining multiple overloads even for properties, that would be sufficient. However, adding the following type features would make writing and reading these definitions more convenient:
Perhaps using the GUID. The definition would apply to multiple GUIDs, although this means that as overloads are added and removed between versions, there should be some way to indicate that a particular overload applies to a particular version or range of versions.
Sheets is the return type of the Worksheets property; Sheets is also another property that also yields a Sheets reference, only that collection might include sheets that Worksheets wouldn't (e.g. Chart sheets).
I like this idea... basically document the "overloads" in various type libraries.. I think we could make that work with an xml format; this one would have an overload definition taking a string, another taking a long, and another taking an array (any array I guess).
That's the beauty of allowing union types: you have two overloads one taking a String Or Integer which returns a single Sheet and another taking an Array(Of String Or Integer) which returns a Sheets collection.
Adding generics into the mix means we could define Worksheets as returning a Sheets(Of Worksheet), which would mean that any individual item in the collection would be a Worksheet (updated the definition in the original comment).
Something of note here: large paths of the type library will use Variant for that purpose. Getting proper union types for all type library members implies a lot of work. While I can understand that supplementing Rubberduck's resolver with type information for this, it should be noted that this could be significant amounts of manual work.
In addition loading external typelib definitions would further slow down the resolver. What you're proposing results in really cool capabilities, but needs a ton of work put in up front.
@Vogel612 It doesn't need to be upfront. This is stuff that could be added gradually, and part of compiling RD into an installer would be to get the latest version of these definitions.
I should also add that I've done most of this work for the Excel, Access and (a large part of the) Word libraries (among others), for the Typescript definitions at DefinitelyTyped (for example, Application.Worksheets and the Sheets collection). I think I would be able to compare the current version against the auto-generated definitions, and port the differences into some XML format, or other VBA-consumable format.
The mental model I have here (if it's not yet obvious) is how Typescript leverages type definitions in external files even when the actual source is not in Typescript.
@retailcoder I'm not sure how much I like the idea of an XML format. It might be a pain to write. Having a VB-style format would be easier.
OTOH, having an XML format would make it possible to consume from other languages that need this information, like Typescript or Python, or even .NET languages, where the union types could be represented as multiple overloads.
The advantage of an XML format is that we might be able to piggyback of enterprise-grade type information interchange formats like XMI, which also have existing toolchains that could enable users to supplement shipped external type definitions with their own external type definitions.
Another option would be to fall back to ABI for interchange and loading. We already have some code dealing with ABI, if memory serves, so that could be adapted...
I like the idea, but it will be a major effort to incorporate this into the resolver.
First of all, I think we should not replace any information we currently hold, because that would certainly break half of RD. After all RD is built on the assumption that we deal with VBA/VB6, which neither supports union data types nor overloads. I think it you rather be handled as MaybeAsType and as MaybeReferences collected on the side and usable by inspections.
Dealing with multiple different types is doable in the resolver, but will basically require enhancing every single binding we currently use, because we have to take all the possible resolution paths into accouont. My main headache around this is how to prune the maybe references based on the information from deeper down in the binding path. E.g. currently the reolution of the accessed part of a member access is final when we come to the binding of the member, but with multiple alternatives, it is not. An example of this is the following expression.
ActiveWorkbook.Sheets(1).ChartArea
Here the type of ActiveWorkbook.Sheets(1) is Object, entirely unabiguously even before we look at ChartArea, which will be of compile time type Variant since it is an unbound member call. Now, using some maybe types, I would have both Worksheet and Chart. Only after looking at the member, we know that it can only be Chart. This will add quite another level of complexity to the resolver.
@MDoerner
we should not replace any information we currently hold
Information generated from the referenced type libraries in the current project? Or is there static information baked into RD itself?
What I mean is that we should not replace the AsType on the declarations we create from the typelib and user code. Having more than one type would confuse half of RD because the assumption throughout the code base is that a declaration can only have one type and that something like Union data types do not exist.
@MDoerner
My main headache around this is how to prune the maybe references based on the information from deeper down in the binding path.
My first thought on reading this was "Typescript doesn't have this problem." Then I realized that Typescript forces you to typecast (ActiveWorkbook.Sheets(1) as Chart) in order to get ChartArea. And VBA doesn't have any sort of typecasting beyond assigning to a variable,.
If this functionality would be limited to the types of method arguments:
Application.Worksheets(Application)Application.Worksheets(1) returns a Sheets(Of Worksheet), so Application.Worksheets(1).ChartArea would be invalidI think it would still provide value, without the specific complexity you mentioned.
Would it be possible to do in two parts?
Just a thought - the first implementation could stop at the first translation. At that point, we would be able to generate new inspections "AmbiguousTypeResult" for unbound calls with suggestions for creating a strong-typed variables. They would have to do fix it one by one in the chain but at least it would be a first step in the right direction.
My problem is not how to do it in VBA. My problem is that I will have to filter the list of possible resolutions (in our internal data) based on information gathered further down the resolution path. That is something that is simply not necessary right now, because of how VBA works.
@bclothier But if someone is used to writing in chained methods, then forcing creation of new variables is a little uncomfortable. Other languages have more natural typecasting syntax, so it makes more sense.
if someone is used to writing in chained methods, then forcing creation of new variables is a little uncomfortable.
I would say "just too bad". If someone is used to writing chained late-bound method calls, IMO Rubberduck's job would be to tell them "this member call here, is late-bound - you want a variable here".
When that someone gets used to intellisense and compile-time validation for every single one of their member calls, they'll never look back is what I'm willing to place my bet on.
Chained calls isn't the problem per se. The problem is when the chain includes variant/object which effectively renders the code unverifiable at the compile-time.
IMO, an unverifiable code is a bug time bomb and yes I do encourage creating a strongly typed variable.
A common pattern I see in Access is to do stuff like:
Me.MySubForm.Form.SomeControl.Value = "foo"
It's verifiable up to the point of Form, which returns an Access.Form which is generic and has no specific information about a control named SomeControl that exists on the particular form. Thus, I encourage people to prefer this syntax:
Dim MySub As Form_MySub
Set MySub = Me.MySubForm.Form
MySub.SomeControl.Value = "foo"
Note that we still have chained access but it's now 100% verifiable. That is the main thing that I care about.
ref. #4936
Most helpful comment
Chained calls isn't the problem per se. The problem is when the chain includes variant/object which effectively renders the code unverifiable at the compile-time.
IMO, an unverifiable code is a bug time bomb and yes I do encourage creating a strongly typed variable.
A common pattern I see in Access is to do stuff like:
Me.MySubForm.Form.SomeControl.Value = "foo"It's verifiable up to the point of
Form, which returns anAccess.Formwhich is generic and has no specific information about a control namedSomeControlthat exists on the particular form. Thus, I encourage people to prefer this syntax:Note that we still have chained access but it's now 100% verifiable. That is the main thing that I care about.