Is GitHub Copilot Any Good for Business Central Development?

TL;DR

No. At least, not yet.

Maybe later.

But even then, maybe not.

Intro

For now this is necessarily a simple, first impressions post about GitHub Copilot. I’ve used it for a few weeks, tweeted enthusiastically in the first couple of days’ use and have now disabled it. What is it, how does it work, what’s good, what’s not so good?

What is GitHub Copilot?

Your AI pair programmer. With GitHub Copilot, get suggestions for whole lines or entire functions right inside your editor.

copilot.github.com

Copilot is a service from GitHub, accessed through a Visual Studio Code extension which provides suggests for code and comments as you are typing.

How Does it Work?

how it works
Image taken from https://copilot.github.com/

I won’t pretend to know much about the inner workings of the extension but this is how I understand it. GitHub clearly have an enormous amount of open source code under their control for all manner of programming languages. They have used that code to train an AI model.

The Copilot VS Code extension takes your existing code and comments, feeds that to the model and attempts to predict what you might want to type next. The best suggestion shows up faintly beyond the cursor and you can tab to accept the suggestion. Alternatively you can load the top 10 suggestions and select the best one to insert into your code.

Some Examples

Below is an example where I have created a new procedure called ValdiateEmailAddress. We’ll ignore the obvious issues with the suggestions for now and just show it as an example of how the extension works.

ValidateEmailAddress exmaple

Given that I’ve called the new method ValidateEmailAddress the extension has generated (apparently generated and not just straight copied from an existing repo) the suggested code which I have accepted by pressing TAB each time.

Here’s another example. Let’s create a new CreateSalesOrder method and view the suggested solutions.

CreateSalesOrder example

A couple of interesting things to notice here. Most of the solutions attempt to do something with the Order Origin Code field. That’s the only other code in the file, so Copilot seems to be giving a lot of weight to that in the suggestions. Some of the suggestions use keywords from other languages: this, var and new.

More Context

Those are the sorts of examples that you’ll find in YouTube videos which are enthusiastic about Copilot. Enter a method name, type a short comment that describes what the code should do and Copilot automagically suggests an implementation for your method. Any examples that you are going to be impressed by are likely to be written in JavaScript, Ruby or Python, not AL. More about that later.

Perhaps an example where we provide more context is fairer. The below example is of writing an automated test. Notice that I already have a test to check that releasing a sales order without a certain field populated throws an error. I’m going to create another test to check the same behaviour for sales invoices.

Writing an automated test with Copilot

Copilot has a much better time of suggesting the code this time, following the pattern of the above automated test. Not only do all of the suggestions compile, but they are correct. Now we’re getting somewhere.

Yes, and no. I’m not knocking it – but all it is actually doing is copying the above test and replacing all instances of “order” with “invoice”. We can already do that in VS Code in a few keystrokes – much faster than messing around with Copilot.

That said, it does demonstrate that Copilot “learns” from the code that you’ve already written and can make repetitive work more efficient. I typically always start an automated test with Init(); and finish with one or more calls to the Assert codeunit. Copilot quickly starts suggesting those lines, even with appropriate comments in calls to Assert.AreEqual(). Hence my initial enthusiasm and tweet.

Problems

What’s the problem then? Why does Copilot keep suggesting code that doesn’t compile? The biggest problem is that Copilot just hasn’t seen enough AL to know what it should look like. The suggestions do improve if you give it more context and write blocks of code which are similar to something you’ve written before, but ultimately the model hasn’t been trained on enough AL code.

But wait? Isn’t the whole of the base app for the past few versions on GitHub now? That’s thousands of AL files. Yes, it is – but that is a few H20 molecules in a drop in GitHub’s ocean. At the time of writing there are fewer than 100 GitHub repos containing AL code and approximately 31K AL files (see here).

Sounds like a lot of files…until you compare it to other languages.

LanguageApprox. No. of Files
AL31 thousand
Python100+ million
Java200+ million
PHP600+ million
C1.2+ billion
JavaScript1.4+ billion
https://github.com/search?q=language%3Ajavascript&type=code

If you’re trying to solve a problem in JavaScript it is pretty certain that someone, somewhere writing open source code has solved it before. Copilot ought to be able to generate a range of sensible solutions. Whether you want those suggestions is another matter.

Pair Programming

GitHub describe Copilot as “Your AI pair programmer.” That’s nice. As long as you like pair programming with someone in your ear the whole time suggesting the next line or two that you should write – before you have time to think for yourself.

I think that’s been my biggest issue with it. When I speak to someone else about what I’m working on I usually want to talk bigger picture. Do I understand the requirements? How does this affect some existing functionality? How does this integrate with other apps that we are writing?

When I write a method declaration I want to give some thought to how I’m going to approach the implementation. When Copilot pops up with some suggested implementation invariably I get distracted reading it rather than thinking about what code to write myself. Even if they were good suggestions I’m not sure that I’d like that. The fact that most of the suggestions use syntax from another language and don’t even compile makes it even worse.

Conclusion

Copilot is an interesting concept and I’ve enjoying playing with it. There have definitely been some moments when I’ve been very impressed with the suggestions that it has made. There have been times when it has saved me time writing some repetitive code. Or code which changes predictably between lines. You could compare it to auto-fill in Excel. Give it a few examples to demonstrate how the value changes each row and then auto-fill as many rows as you need following the same pattern.

However, I pretty quickly found the annoyances outweighed the benefits. The novelty wore off and I realised I was less productive with Copilot than without it. Maybe the suggestions will improve over time. Then again, maybe AL is just too niche and we can’t expect it ever to work as well it does with JavaScript.

Presumably at some point Microsoft are going to monetise this, but I can’t see AL developers paying for it. For now the extension is going to remain installed, but disabled.

Dmitry has submitted a session at DynamicsCon about Copilot. If you found this interesting you should check it out. Maybe he’ll be more enthusiastic than me.

Test Explorer in Visual Studio Code

The July 2021 release of Visual Studio Code (1.59) introduced a new testing API and Test Explorer UI. From v0.6.0 this API is used by AL Test Runner.

Test Explorer Demo

Improvements

UI

The biggest improvement is the Test Explorer view which shows your test codeunits, their test methods and the status of each.

Hovering over a test gives you three icons to run, debug or open an editor at the test.

You can run and debug all the tests in a given codeunit by hovering over the codeunit name or run and debug all tests at the top.

The filter box allows you to easily find specific tests, which I’ve found useful in projects which several test codeunits and hundreds of tests.

You can also filter to only show failed tests or only test which are present in the codeunit in the current editor. The explorer supports different ways of sorting and displaying the tests.

Icons are added into the gutter alongside test methods in the editor. Left click to run the test or right click to see this context menu with more options.

The old “Run Test” and “Debug Test” codelens actions are also still added above the test definition.

Commands & Shortcuts

A whole set of new commands are introduced with keyboard chords beginning with Ctrl + ; The existing AL Test Runner keyboard shortcuts still work but there are some nice options in the new set – like “Test: Rerun Last Run” to repeat the last run test without having to navigate to it again.

Using the Test Explorer

Using the Test Explorer is pretty self-explanatory if you’ve already been using AL Test Runner. When you open your workspace/folder the tests should be automatically discovered and loaded into the Test Explorer view. On first opening all of the tests will have no status i.e. neither pass or fail – but results from now on will be persisted.

Running one or more tests – regardless of where you run them from (Test Explorer, Command Palette, CodeLens, Keyboard Shortcut) – will start a test run. You’ll see “Running tests…” in the Status Bar.

Once the test(s) have finished running you’ll see the results at the top of the Test Explorer, “x / y tests passed (z %)”, and the status icons by each test will be updated.

If the tests do not actually run e.g. because your container isn’t started then the test run will not finish and “Running tests” will continue to spin at the bottom of the screen. You can stop the run manually from the top of the Test Explorer, fix the problem and go again.

Using Code Coverage in Business Central Development

Intro

Sample code coverage summary

In the latest version of AL Test Runner I’ve added an overall percentage code coverage and totals for number of lines hit and number of lines. I’ve hesitated whether to add this in previous versions. Let me explain why.

Measuring Code Coverage

First, what these stats actually are. From right to left:

Code Coverage 79% (501/636)
  1. The total number of code lines in objects which were hit by the tests
  2. The total number of lines hit by the tests
  3. The percentage of the code lines hit in objects which were hit at least once

Notice that the stats only include objects which were hit by your tests. You might have a codeunit with thousands of lines of code, but if it isn’t hit at all by your tests it won’t count in the figures. That’s just how the code coverage stats come back from Business Central. Take a look at the file that is downloaded from the test runner if you’re interested (by default it’s saved as codecoverage.json in the .altestrunner folder).

It is important to bear this is mind when you are looking at the headline code coverage figure. If you have hundreds of objects and your tests only hit the code in one of them, but all of the code in that object – the code coverage % will be a misleading 100%. (If you don’t like that you’ll have to take it up with Microsoft, not me).

What Code Coverage Isn’t Good For

OK, but assuming that my tests hit at least some of the code in the most important objects then the overall percentage should be more or less accurate right? In which case we should be able to get an idea of how good the tests are for this project? No.

Code Coverage ≠ Test Quality

The fact that one or more tests hits a block of code does not tell you anything about how good those tests are. The tests could be completely meaningless and the code coverage % alone would not tell you. For example;

procedure CalcStandardDeviation(Values: List of [Decimal]): Decimal
var
    Value, Sum, Mean, SumOfVariance : Decimal;
begin
    foreach Value in Values do
        Sum += Value;
    Mean := Sum / Values.Count();
    foreach Value in Values do
        SumOfVariance += Power((Value - Mean), 2);
    exit(SumOfVariance / Values.Count());
end;

[Test]
procedure TestCalcStandardDeviation()
var
    Values: List of [Decimal];
begin
    Values.Add(1);
    Values.Add(3);
    Values.Add(8);
    Values.Add(12);

    CalcStandardDeviation(Values);
end;

Code coverage? 100% ✅

Does the code work? No ❌ The calculation of the standard deviation is wrong. It is a pointless test, it executes the code but doesn’t verify the result and so doesn’t identify the problem. (In case you’re wondering the result should be the square root of SumOfVariance).

Setting a Target for Code Coverage

What target should we set for code coverage in our projects? Don’t.

Why not? There are a couple of good reasons.

  1. There is likely to be some code in your project that you don’t want to test
  2. You might inadvertently encourage some undesired behaviour from your developers

Why Wouldn’t You Test Some of Your Code?

Personally, I try to avoid testing any code on pages. Tests which rely on test page objects take significantly longer to run, they can’t be debugged with AL Test Runner and I try to minimise the code that I write in pages anyway. Usually I don’t test any of:

  • Code in action triggers
  • Lookup, Drilldown, AssistEdit or page field validation triggers
  • OnOpen, OnClose, OnAfterGetRecord
  • …you get the idea, any of the code on a page

You might also choose not to test code that calls a 3rd party service. You don’t want your tests to become dependent on some other service being available, it is likely to slow the test down and you might end up paying for consumption of the service.

I would test the code that handles the response from the 3rd party but not the code that actually calls it e.g. not the code that sends the HTTP request or writes to a file.

Triggers in Install or Upgrade codeunits will not be tested. You can test the code that is called from those triggers, but not the triggers themselves.

Developing to a Target

When a measure becomes a target, it ceases to be a good measure.

Marilyn Strathern

If we already know that we have some code that we will not write tests for then it doesn’t make a lot of sense to set a hard target of 100%. But, what other number can you pick? Imagine two apps:

  1. An app that is purely responsible for handling communication with some Azure Functions. Perhaps the majority of the code in that app is working with HTTP clients, headers and responses. It might not be practical to achieve code coverage of more than 50%
  2. An app that implements a new sales price mechanism. It is pure AL code and the code is almost entirely in codeunits. It might be perfectly reasonable to expect code coverage of 95%

It doesn’t make sense to have a headline target for the developers to work to on both projects. Let’s say we’ve agreed as a team that we must have code coverage of at least 75%. We might incentivise developers on the first project to write some nonsense tests just to artificially boost the code coverage.

Meanwhile on the second project some developers might feel safe skipping writing tests for some important new code because the code coverage is already at 80%.

Neither of these scenarios is great, but, in fairness, the developers are doing what we’ve asked them to.

What is Code Coverage Good For?

So what is code coverage good for? It helps to identify objects that have a lot of lines which aren’t hit by tests. That’s why the output is split by object and includes the path to the source file. You can jump to the source file with Alt+Click.

Highlight the lines which were hit by the previous test run with the Toggle Code Coverage command. That way you can make an informed opinion about whether you ought to write some more tests for this part of the code or whether it is fine as it is.

50% code coverage might be fine when 1 out of 2 lines has been hit. It might not be fine when 360 out of 720 lines have been hit – but that’s for you to decide.

Further Reading

https://martinfowler.com/bliki/TestCoverage.html

Increase ODataServicesOperationTimeout for Longer Debugging

TL;DR

Invoke-ScriptInBCContainer [container name] {Set-NavServerConfiguration bc -KeyName ODataServicesOperationTimeout -KeyValue 00:20:00 -ApplyTo All}

Timeout Error

If you use AL Test Runner to debug your tests then you are using the OData services to run the test in the background. OData calls have a timeout that is determined by the ODataServicesOperationTimeout key in the service tier configuration. The timeout is set to 8 minutes by default.

This means that a debug session will be closed after 8 minutes and if you haven’t finished you will receive an error like this:

The operation has been canceled because it took longer to generate rows than the specified threshold (00:08:00). Refine your filters to include less data and try again.

Debugging something for more than 8 minutes isn’t a happy place to be – but it happens. You step into a posting routine, the guts of some complex calculations or you aren’t really sure where to start and have to step through loads of code to try to narrow down the field of investigation. That was me this morning.

You can increase the timeout, for longer, blissfully uninterrupted debugging sessions. Yay. You can use the above command (on the Docker host) to set a new value for the timeout in hh:mm:ss. I don’t know if there is a maximum limit on the timeout but if you need more than 20 minutes, as per the example, then you have my sympathy!

Upgrade Your StrMenus to Confirmation Pages

Exhibit A: StrMenu when posting a sales document

Prompting the User

Sometimes we want to ask users to make a decision before executing some business logic. Often we just want a yes/no answer and can use a simple Confirm.

But what if you can’t phrase the question to give a yes/no answer? Or if there are more than two options for the user to select between?

Traditionally we’ve used the built-in StrMenu command for that. Think of the menu that you’re given when posting a sales, purchase or warehouse document.

StrMenu is nice and easy to use. Just provide a comma-separated string of the options that you want the user to select between optionally with a default and some instructions and get an integer value back. 0 indicates that the user clicked the Cancel button, otherwise the return value corresponds to the options in the string.

Selection := StrMenu(ShipInvoiceQst, DefaultOption);
Ship := Selection in [1, 3];
Invoice := Selection in [2, 3];
if Selection = 0 then
    exit(false);

That’s cool, it works nicely. Let’s not over-complicate things unnecessarily, but you might want to try out an alternative in some situations.

Shortcomings

What’s not to love? Give the user some options, get their selection back. Well…

UI

You’ve got no control over the prompt that the user is given. That might be a good thing if you don’t care what the prompt looks like, or might be a problem if you’ve got specific requirements.

Extensibility

It’s difficult to allow someone else to extend. You could have an event before and after the StrMenu I suppose – allow someone to change the string and read the return value. If you’ve got more than one subscriber this is going to get messy pretty quickly though.

Maintenance

Comma-separated values – you should put the values in a label so that it can be translated, but what if you want to use those values more than once? Maintain them and translate them in more than one place? A set of possible values is a first class citizen in AL now that we’ve got enums.

Confirmation Pages

An alternative to StrMenu is to use a confirmation page to display the values defined by an enum. Have that enum implement an interface and you’ve got a better recipe for maintaining your code and allowing others to extend it.

Example

Imagine that you are creating something to allow users to interact with phone numbers. We’ll have some assist-edit or drilldown code on a phone number field and allow the user to take some action.

Initially we decide that the user will be able to either:

  • Create a new contact and add the phone no. to it
  • Add the phone no. to an existing contact

Phone No. Action Enum

First, create an enum that defines the available options.

enum 50100 "Phone No. Action"
{
    Extensible = true;

    value(0; "Create a new contact")
    {
        Caption = 'Create a new contact';
    }
    value(1; "Add to existing contact")
    {
        Caption = 'Add to existing contact';
    }
}

I’ve left Extensible = true because I want to allow other devs to add new phone no. capabilities later.

ConfirmationDialog Page

Now a page to present the options in the enum to the user. Because it’s a page we’ve more control over what is displayed and how it works. Add instructional text, add the phone no. for information etc. The PageType is ConfirmationDialog. When you add an enum control to the page it will be rendered as radio buttons rather than the usual dropdown list.

page 50100 "Select Phone No. Action"
{
    PageType = ConfirmationDialog;
    ApplicationArea = All;
    UsageCategory = Administration;
    Caption = 'Select Phone No. Action';

    layout
    {
        area(Content)
        {
            group(General)
            {
                ShowCaption = false;
                InstructionalText = 'What would like to do?';
                field(PhoneNoCtrl; PhoneNo)
                {
                    ApplicationArea = All;
                    Editable = false;
                    Caption = 'Phone No.';
                }
                field(PhoneNoAction; PhoneNoAction)
                {
                    ApplicationArea = All;
                    ShowCaption = false;
                }
            }
        }
    }

    var
        PhoneNoAction: Enum "Phone No. Action";
        PhoneNo: Text;

    internal procedure Intialize(PhoneNo2: Text)
    begin
        PhoneNo := PhoneNo2;
    end;

    internal procedure GetResult(): Enum "Phone No. Action";
    begin
        exit(PhoneNoAction);
    end;
}
Sample ConfirmationDialog page

Which gives a page that looks like this. OK, looking good. I’ve also created a codeunit which initialises and calls the page and then retrieves the result.

Having fetched the selected enum value we can call code to create a new contact, have the user select an existing contact or whatever we want.

That’s great, but it could still be better. If another dev adds an enum value it will be displayed on the page correctly but how will their code get called? In a case statement? Should they subscribe to an event? No.

Phone No. Action Interface

We should define an interface that we expect all Phone No. Actions to implement. For now we probably only need one method. I’ll just call it Handle. You could imagine this being more complex in a real example depending on the type of number (Mobile, Landline), country code or some other factors.

interface "Phone No. Action"
{
    procedure Handle(PhoneNo: Text);
}

Now we make the Phone No. Action enum implement that interface. I’ve created a new codeunit for each action. Those codeunits will also implement the interface and have the guts of the business logic for each action.

enum 50100 "Phone No. Action" implements "Phone No. Action"
{
    Extensible = true;

    value(0; "Create a new contact")
    {
        Caption = 'Create a new contact';
        Implementation = "Phone No. Action" = "Create New Contact";
    }
    value(1; "Add to existing contact")
    {
        Caption = 'Add to existing contact';
        Implementation = "Phone No. Action" = "Add to Exist Contact";
    }
}

Calling the Confirmation Page and Interface

Finally, this is some sample code that calls the confirmation page to have the user make a selection and then handle their response. Assign the selected action to the interface variable and call its Handle method.

procedure Select(PhoneNo: Text)
var
    SelectAction: Page "Select Phone No. Action";
    PhoneNoAction: Interface "Phone No. Action";
begin
    SelectAction.Intialize(PhoneNo);
    if SelectAction.RunModal() = Action::OK then begin
        PhoneNoAction := SelectAction.GetResult();
        PhoneNoAction.Handle(PhoneNo);
    end;
end;

It should be pretty easy to extend this in the future. Let’s say that you want to add an option to call the number or send a message. Just

  1. Extend the enum with the new action
  2. Create a new codeunit which implements the interface and makes the phone call, sends the message or does whatever you need
  3. Link your new enum value with the new codeunit
  4. (there is no step 4, just three easy steps)

Conclusion

Obviously, this is a lot more work than just using StrMenu, especially if some of these concepts are new to you. I’m not going to tell you that you ought to do this or that it is necessarily always worth the extra effort – but it’s good to have the option.

Maybe you’d start with a StrMenu and later refactor it into a confirmation dialog when you know that you’ve got a case for it. Go wild.