- How to set up your environment to create Alexa skills using the new ASK SDK controls framework (beta) and our Alexa Skills Toolkit for Visual Studio Code as the only requirements.
- How to use the code code editor to bootstrap a skill
- How to generate the voice interaction model and Alexa Presentation Language (APL) documents from source code,
- How to set up regression tests
- How to componentize your skill request handlers and make them object-oriented (ES6)
- How to leverage your local environment for debugging.
Buckle up and let’s get started!
Setting up Visual Studio Code
The first step is to make sure that we have proper code editor support to develop our skill since we will be conducting all skill building activities from it.
Our preferred tool for VS Code integration is the Alexa Skills Toolkit for Visual Studio Code which we bumped up to support the latest ask-cli v2.x project structure, an embedded simulator, local debugging and APL document preview. You can learn more here, but essentially the prerequisite is having git and Visual Studio Code installed (the ask-cli is no longer required) and then just installing our extension from VSCode.
Once the extension is installed you will notice a new Alexa logo in the left hand icon based menu of Visual Studio Code. Click on it and sign into your Amazon Developer account.
Now VS Code and your developer account are linked so that skill deployments, for example, will be done on that account.
Creating a Skill
The Alexa Skills Toolkit gives us the ability to create an end-to-end Alexa-hosted skill from VS Code. Let’s go ahead and create one as shown below (I name the skill Hello World as that’s the skill template that we get by default):
Note: you should add
node_modules to a
.gitignore file in your Alexa Hosted Skill since
package.json will be used to install all dependencies when the AWS Lambda function is deployed.
Once you’re done, you’ll end up with a minimalist Hello World sample skill that now sits on your local machine, but is also deployed to the Alexa service in a development stage. Since this is an Alexa-hosted skill we not only take care of hosting the skill front-end (the voice interaction model) but also provision the back-end for you as an AWS Lambda function. On top of that, all the code is backed by a git repository so you don’t have to worry about hosting space or version control. You just focus on evolving your skill locally and every time you do a
git push on an Alexa-hosted skill, we take care of the deployment.
Generating the Voice Interaction Model
Now that we have our basic skill deployed and have the code on the local machine, we’ll leverage the SDK Controls Framework to generate the voice interaction model (I’ll give you more information about this SDK in the next section so bare with me). Let’s start by adding the
ask-sdk-controls NPM module as a dependency to our
As you’ probably noticed we also use the opportunity to bump up our
ask-sdk-* packages to the latest version.
lambda directory (I will call it
build_interaction_model.js) and paste the code below into this file:
In this piece of code, we’re effectively generating the voice interaction model for Hello World from scratch. The model generator supports more than the basics shown here, such as slots, dialogs, validators, prompts, etc (if you’re interested please see this page) but these are not necessary for such a simple model.
Let’s now add a script in the
scripts section of
package.json that allow us to generate
en-US.json on demand (let’s name it
Note: we’re overwriting the existing model json file here, so be careful. If you want to incorporate a previous model and build on top of it with the model generator you can feed in an existing model using the method
loadFromFile as shown here.
We can run this model creation script it manually before deploying a skill as seen above or add it to a prebuild automation process that should call
npm run buildmodel before pushing the skill. For example, we could add a git pre-commit hook using Husky by adding this to our
Note: Husky is an NPM package that allows you to easily piggy-back on the vast majority of existing git hooks and normally used to prevent bad commits, checking for style, secure code practices, etc.
Leveraging SDK Controls in Your Back-end
Now that we have added SDK Controls to our back-end we have a great opportunity to leverage other aspects of the framework (including its object oriented nature, natively it’s built with TypeScript). One of the most important objectives of the framework is to encourage the development of re-usable components, called Controls, that encapsulate portions of the skill’s state and dialog logic such that they can be developed independently and then composed together to build an entire skill. The framework provides patterns for building and grouping controls following the MVC pattern (Model-View-Controller) and defines the rules regarding how Controls should interact.
We’re not going to leverage all the potential of Controls here—we’ll just use them to define the request handlers of our Hello World back-end code. We’ll only be scratching the surface of the framework and focusing on leveraging its object-oriented nature to create our request handlers to improve code reusability (you’ll notice we use less code than in the standard Hello World back-end code).
Let me show you the new code of
index.js and then explain it:
Note: source code available here. Issues with the source code? Just contact me via Twitter: @germanviscuso
As seen in the source code above the
ask-sdk-core package is imported as usual (we need it to get a custom skill builder near the end of the file) but we now import new types from the Controls Framework package (2nd line of code).
Every Control implements the
IControl interface and typically extends the
Control abstract base class to obtain default functionality. Since all Controls support the functions
canHandle they can be direct replacements of request handlers (with the help of a runtime
ControlHandler adapter as you can see near the end of the file).
Right after we require the necessary elements we create a new control abstract class (that inherits from
LiteralContentControl. This class defines a control that can provide a literal Alexa response (a string) and a flag on whether to close the session but notice that it does not define a
canHandle (we’ll leave that to the subclasses, more on this below). Notice these fundamental aspects of a Control:
ControlInput) gives you access to the incoming request data.
ControlResultBuilder) lets you build a skill response.
- You add specific responses to the result via an system act
The framework separates response logic from rendering. A system act is a simple object representing “something to communicate to the user”. In this case the
LiteralContentAct is used to communicate some arbitrary literal content. The framework defines various system acts that have precise purposes and developers can create new ones too. System acts are an intermediate representation that are converted to a complete skill response. In the case of
LiteralContentAct, a literal response is directly attached to the response output in the control’s rendering phase, see the diagram below to learn about all phases in the framework.
So let’s use our new
LiteralContentControl to define a Control that will handle a launch request. As you can see in the code, right after defining our abstract class we just define a subclass that implements only the missing part, the
canHandle, giving us good reusability. Note that
InputUtil comes handy as it defines a set of static functions to check on the nature of the request. After that we do the same with the rest of the request handlers (we’ll define the actual output speech string of each response when we instantiate the classes). We have a special case where we can’t reuse our
LiteralContentControl in our reflector handler
IntentReflectorControl because a reflector handler includes a variable in the output string that is the intent name and we can only get it at runtime. That’s why, in this case, we just subclass
Control (we leave it as an exercise for the reader to adapt
LiteralContentControl to handle a variable coming from the request data).
Now it’s time to orchestrate all these Controls, and for that the framework resorts to a root control. Fortunately, the SDK Controls framework supports parent-child relationships via aggregation. So we then create a root control called
HelloControl subclassing a predefined control type called
ContainerControl. This is ideal for aggregating children and delegating functionality since
canHandle will be called automatically on the children. As you can see in the code, then we instantiate each control with the specific output speech literal and pass a flag on whether to close the session. Note that every control instance must have an id which is usually passed on instantiation but we have provided a workaround so that we don’t need to pass it in the constructor. The name of the subclass is automatically assigned as the control id but this only works if you have singletons of these classes which is the case in our code. You might be better off by explicitly assigning ids to your controls as expected.
We’re almost done now! We have child controls that can serve as specific request handlers which we have aggregated in a root control, but Controls can’t work in isolation. Something at runtime needs to orchestrate them. Enter
ControlManager is a high-level organizer authored by the skill developer. It interfaces directly with the framework’s runtime (
ControlHandler) and takes care of high-level behaviors. One of its key tasks is to create the tree of controls at the start of each turn. So after defining all handler classes, we create the control tree in the manager by just instantiating our root control.
Finally, the framework provides the
ControlHandler which is a
ControlHandler can be added just like any other
RequestHandler. Skills can freely mix-and-match regular request handlers with control handlers. So in order to work as a standard request handler we need to wrap our
ControlManager in a
ControlHandler (near the end of the file).
In total, near the end we create a function that can be used as an entry-point by AWS Lambda (as usual). Which is not usual is the last line of code which exports the
HelloManager for use in regression tests which brings us straight into our next section.
The following diagram shows the high-level runtime flow in the SDK Controls framework as a
Request is consumed and a
Note: for more information on these phases, you can check the User Guide.
Adding Regression Tests
Yet another interesting feature of the SDK Control framework is the addition of several support classes for regression tests. When combined with familiar test packages such as the assertion library chai and the test framework mocha we can have a very useful test harness.
Let’s start by upgrading
package.json to add new development dependencies and also provide a proper script for running the tests (plus we add support for running the tests to the Husky pre-commit hook). The file
package.json will now look like this:
Note: we have added the chai and mocha development dependencies besides a test script and a pre-commit hook entry
Now let’s create a
test subdirectory inside the
lambda directory and create a file called
test1.js in it with this content:
Above we use mocha, chai and a bunch of support classes provided by our framework which we’ll now explain. The
waitForDebugger utility adds a pause to give the Visual Studio Code debugger time to attach (but this is not necessary for VSCode version 1.41.1 and above).
As you can see in the code the SDK Controls Framework provides you with testing support. Here, in each one of the two tests, a
SkillTester is initialized with a reference to a
HelloManager (this is why we exported
HelloManager in the end of
index.js as shown before).
SkillTester can run a turn through the skill code and validate that a particular prompt is produced.
SkillTester.testTurn() takes three parameters: the first is the user utterance (which is left blank when testing a launch request). The second parameter is a
ControlInput that will be the input for the turn. The
TestInput class has utilities to create the most common kinds of input such as an
LaunchRequest and APL
UserEvents. The third parameter is the prompt that the skill is expected to produce. In both tests we’re looking for a specific output that should match what we’re sending as a response from the back-end code.
After testing a turn, further validations can be made by inspecting various objects. For example, note that in the end of each test we check if the session is being ended or remains open. The second test in particular mimics the situation when a user invokes the skill with “U: Tell Hello World Hi” and demonstrates how to create a simple
ControlInput that wraps an
Let’s run the tests via the command line with
npm test (don’t forget to install the new dependencies with
npm install first):
Regression tests of this kind are extremely useful for validating a multi-turn skill and preventing regressions. We recommend a workflow that uses regression tests to validate key scenarios and then live testing to verify integrations and explore new dialog patterns.
Enabling Local Debugging
During Alexa Livewe also announced a new functionality that allows you to have Alexa requests from the Test simulator (now also embedded in the Toolkit!) invoke skill code running right on your local machine. This allows you to quickly verify changes and inspect skill code with breakpoints using VS Code’s debugger.
Now, you don’t need to setup a connection tunnel to you local machine by yourself or change any endpoint url in the skill configuration. We’re doing all of this automatically for you! All you have to do is install a single NPM package and set up local debugging in VS Code. To test our skill with the lambda code running locally in Visual Studio Code, you can follow the steps outlined in Test your local Alexa skill, but it basically comes down to including this new development dependency in
package.json by running:
and then creating a launch configuration in VS Code by creating a
launch.json file and adding a new Alexa Skills Debugger configuration (via the Add Configuration button or autocomplete). I summarize the configuration process here.
- Local debugging is only available for a skill’s development stage.
- All Alexa requests for the development stage skill will be routed to your development machine while the connection is active.
Now you can start a debugging session by just clicking on the Debugger icon of VS Code (fourth from the top, on the left activity bar) and then on the Play button (green triangle) of the VSCode debugger (Debug Alexa Skill). After that the debugger will be listening locally and you can go and open your skill via voice (use the Test tab of the Developer Console as the sessions there do not expire in practice which gives you enough time to set breakpoints and debug line by line).
While we’re at it we can augment VS Code’s launch configuration (
.vscode/launch.json) to allow us to also launch the tests and the model builder from the debugger screen (I took those launch configs from the Fruit Shop sample):
Note: you’ll need an extra dependency to make these additional launch configurations work. Make sure to do this before trying setting them up:
npm install --save-dev source-map-support
We recommend using local-debugging when doing initial live testing of your skills as it provides fast and easy access to log messages and the ability to hit breakpoints can prove invaluable.
Generating APL Visuals From Code
If you’re a React developer you’ll be happy to know that we’re providing a new framework to generate Alexa Presentation Language (APL) documents with JSX. If you’re not familiar with this however, don’t worry, just keep reading!
In a nutshell our new framework called JSX for APL (beta) is a React-based APL templating framework that allows developers to define APL documents within code. By using the React-style JSX/TSX file format, developers can include JSX-based APL components as XML-style definition for the APL and shorten the APL definition code length, making the development more manageable.
In order to use it we just need to install the following dependencies by doing:
ask-sdk module is a dependency of
ask-sdk-jsx-for-apl and because it’s a superset of
ask-sdk-model then we can delete these two in
lambda directory with our original files to a directory called
source (where you’ll do your development from now on). And every time we transpile the code doing
npm run build, we’ll generate the
lambda directory from scratch with the transpilation output code ready to deploy with
git push. In order to support this, this is what the
package.json file looks like now:
Note: Alexa-hosted skills, unlike self-hosted skills, do not allow you to specify a path to the lambda code in the skill config files or take the code from a top level
build directory as seen here, so your best bet is to have your source code in a different directory before transpiling and then place all the transpiled files inside
lambda.. I automate this process by doing an
npm run build in the Husky pre-commit hook as seen above. What we lose here is the ability to debug the original code directly. The debugger will work over the transpiled files in
lambda so that’s where you have to place your breakpoints.
In order to fully support the JSX syntax you’ll also need to add this
.babelrc file to the
Now we can create an APL document using the
APL component and
MainTemplate component. Go ahead and create an
apl subdirectory inside the
source directory and in it create a file called
helloapl.js with this content:
Note: Without the transpilation process, the import/from syntax and the special <> tags would not be recognized when pushing the code to the AWS Lambda function.
We just created a subclass of
React.Component which renders an
APL component containing a
MainTemplate that includes a basic
Container with the
Text “Hello World!” in font size 50 and the Alexa blue color. Easy as pie!
Now let’s render this APL document in our skill every time it responds with a “Hello World!”. For that we need to modify
index.js by adding the following new requires:
And, most importantly, we need a new abstract
AplControl class that leverages our
LiteralContentControl class and adds a hook on top to render the JSX based APL document (which we’ll pass in the constructor):
In the code above we just overwrite
renderAct (which by default would just have the associated act render itself as seen in the last return statement) so we can access the
responseBuilder and append a directive to render the APL document (we of course first check if APL is enabled). Notice the
getDirective function: JSX for APL not only generates the APL document for us but also the full directive to pass it directly to the response builder!
HelloIntentControl now just changes its inheritance to
AplControl and gets the APL document as a parameter in the constructor:
Note: because we’re using the <> React notation then
index.js now also needs to be transpiled in order to work fine as an AWS Lambda function. That’s not a problem since in this project we’re transpiling all files.
Before testing this out make sure you enable the APL interface in the skill manifest (
Note: don’t overwrite
skill.json with the code above, you just need to add this interface in the specified location but keep the rest of the sections in the file.
Now you should go to the
source directory, update your skill as part of the natural development process, commit your changes with git (this will trigger everything I’ve shown you via a pre-commit hook) and then just
git push your changes or deploy via the Toolkit’s Deploy button (which is equivalent, both operations translate into a deployment in Alexa-hosted skills). From now on, as part of this setup, the
lambda directory and files will be auto generated and added automatically on each commit so you don’t have manually commit that directory.
Let’s deploy from the Toolkit by clicking on the Deploy & build button:
And now we’re ready to test the skill!
Open the embedded simulator in the Toolkit, enable the skill in the Development Stage using the top combo box and type “tell hello world hello”. You’ll see the “Hello World!” response and our APL document rendered on screen:
Note: Debugging the skill is exactly the same but you’ll have to previously start the debugger in Visual Studio Code and maybe set some breakpoints before starting your manual test. If you’re transpiling the breakpoints have to be set in the transpiled code.
Wrap Up and What’s Next?
Now you know how to work with your Alexa-hosted skill with Visual Studio Code and git as the only requirements. You will be able to create, expand, test and debug your skills practically without leaving the VS Code including the generation from code of voice interaction models and APL documents.
If you want to follow DevOps best practices you might want include CI/CD (continuous delivery/continuous integration) practices with this stack. We have covered multiple aspects of DevOps in Alexa skills in this re:Invent talk and there are great projects out there in the community showing multiple ways to DevOps your Alexa skill (such as this one).
Two great related projects created by my colleagues that you might want to check out are Local Persistence (in case you want to use a local persistence adapter to quickly check what is going on with your data) and Error Notifications (to closely monitor errors in your skill and get a notification via e-mail or Slack when something unexpected happens).
Finally, I highly encourage you to take a deeper look at the SDK Controls framework to find out about all the things that we did not cover in this blog post. For example, with Controls, you can quickly create a ListControl to offer the user a set of choices including actions the user can take over the items, validations and even functional APL support for the list!
PS: If you have issues with the source code in this post just contact me via Twitter: @germanviscuso
Happy coding! 🙂