<Meta title="Guides/Testing" parameters={{ viewMode: 'docs', previewTabs: { canvas: {hidden: true}, }, }} />
canvas-kit
has 3 levels of testing: unit, functional and visual. Each area has special use-cases
even though tests at various levels may overlap. Here's a simplified explanation of what each level
is meant to test:
- Unit: This level is meant to test the input/output of units of functionality (e.g. utility functions or input props of components that output DOM).
- Functional: This level is browser-based testing and is where UI behavior and accessibility requirements are specified (e.g. Given a modal example, When clicking on the button, Then the modal should be open). Functional tests are against Storybook Stories.
- Visual: This level is where all styling is tested. Some props only have visual effects and it is difficult to reason about at any other level. All Storybook stories can be candidates for visual regression based on an opt-in parameter.
Tests are supposed to be a source of feedback and confidence. To achieve that, a test should be clear about intent and failures. Developers tend to ignore passing tests and focus only on failing tests. If the intent of the test cannot be determined, it is difficult to fix the test. If the failure isn't immediately obvious, too much time is spent fixing the test. Canvas Kit adopts a BDD (behavior-driven development) approach of testing. The most common way to think is about behavior is "given/when/then". These behaviors are a series of specifications and not necessarily "tests". Testing in this repo is not just testing the correctness of our components, but running a series of specifications. Specifications tend to be more granular to give context of what code is expected to do. A "test" simply has to pass. A specification requires meaning.
Bad:
test('SomeComponent should render correctly', async () => {
const {getByTestId} = render(<SomeComponent data-testid="test" text="foo" />);
const component = getByTestId('test');
expect(component.textContent).toEqual('foo');
expect(component.getAttribute('aria-label')).toEqual('foo');
});
This is a "test" and not a specification, but a test that asserts multiple outputs. If any of the
expectations fail, the CI will output something like
Failed: SomeComponent should render correctly
. The message doesn't give any indication of why any
of the expectations would fail. Furthermore, the failures don't give much indication as to what is
failing. Without looking at the code, the failure message of the expectation will say something like
"Expected 2 to equal 1" or "Expected 'bar' to equal 'foo'". Often to find out why the test fails,
the source code of the test has to be consulted and the test has to be debugged. After that, there
is no context to why the failure even matters. This test doesn't give much feedback or confidence on
failure and thus isn't a very useful test.
Good:
describe('SomeComponent', () => {
it('should render the "text" prop as the text', async () => {
const {getByTestId} = render(<SomeComponent data-testid="test" text="foo" />);
const component = getByTestId('test');
expect(component).toHaveTextContent('foo');
});
it('should render the "text" prop as an aria-label for accessibility', async () => {
const {getByTestId} = render(<SomeComponent data-testid="test" text="foo" />);
const component = getByTestId('test');
expect(component).toHaveAttribute('aria-label', 'foo');
});
});
This series of specifications give more context to the test. It is more code, but it is clearer what
the intent of each specification is. If one fails, it will be easier to diagnose what and why. The
assertions also make use of jest-dom
which gives more context into how it fails. Instead of a
generic assertion, the CLI will output something like
Expected <div data-testid="test" aria-label="bar">bar</div> to have an "aria-label" of "foo" but received "bar"
.
The specification should tell us which spec failed and the assertion gives insight into how.
Accessible components tend to have role
s and ARIA attributes to help with selection of specific
elements within a component. React Testing Library comes with some selection helpers based on these
attributes. CSS class names should not be used for selecting components. data-testid
should be
used as a last resort and usually is reserved for examples or an application to help disambiguate
components. Hard-coded data-testid
should be avoided in component source code, but are fine in
unit and functional tests provided the test supplies the data-testid
to make the test more
understandable. Additional info can be found
here.
Canvas Kit uses Jest, React Testing Library, and jest-dom which have a nice API for testing React components. Unit tests should test inputs and outputs of utility functions or props and output of components. For example, if documentation says that a component will pass extra props to a container component, there should be a specification that verifies this claim. If a component's documentation states that children will be rendered, a test should verify the children exist within the containing component.
Good unit tests avoid too much implementation detail. For example, if a component states it will render the children prop, the documentation and specifications shouldn't state where unless it is specific to the functionality of the component. Tests here should define the API surface area that will be guaranteed. The more implementation details, the more difficult the tests are to maintain.
Bad:
const {container} = render(
<SomeComponent>
<div data-testid="test">Test</div>
</SomeComponent>
);
expect(container.children[0].children[0].textContent).toEqual('Test');
This test encodes the DOM structure into it. Any changes to the source code will require a change to the test making the test tightly coupled and fragile.
Good:
const {getByTestId} = render(
<SomeComponent data-testid="container">
<div data-testid="test">Test</div>
</SomeComponent>
);
const component = getByTestId('container');
const child = getByTestId('test');
expect(component).toContainElement(child);
This example is more flexible. The component can be refactored with DOM elements added or removed
and the test will still pass as long as the div
is a child of the component somewhere (the
specification is met). Also this example will have a more useful failure message since
.toContainElement
has the context that it is expecting an element in another element vs a match of
a string.
Canvas Kit does not contain DOM-based snapshot tests and uses Visual Tests instead. DOM snapshots failures are often difficult to parse. Humans tend to be better at noticing and discerning visual changes than changes to a DOM structure.
If your project uses snapshot tests and Canvas Kit, chances are you'll run into issues with changing ids and other ARIA attributes. Canvas Kit generates unique ids that are different every time the page loads. This can be a problem with snapshot tests. To fix this, you'll need to add special code to your test bootstrap file. For example:
import {setUniqueSeed, resetUniqueIdCount} from '@workday/canvas-kit-react/common';
beforeEach(() => {
setUniqueSeed('a'); // set a static seed
resetUniqueIdCount(); // reset the id counter before every test
});
This will ensure snapshot tests have stable ids for each snapshot. It is still possible to get ids changing if you add an additional component that uses id generation - subsequent ids will be different, but this will prevent snapshot tests that don't have any changes from showing diffs.
Canvas Kit uses Cypress for browser-based behavior testing (additional info: Why Cypress?). Functional tests ensure the code meets functional specifications from a user's perspective. All functional tests are written against a Storybook Story. Specifications can come from many different stakeholders including product management, user research, and accessibility. A functional test should read in the Given/When/Then format and fit in well as acceptance criteria.
Example of a usability specification: "Given the modal example, when a user clicks the 'Delete Item' button, then the modal opens".
Example of an accessibility specification: "Given the modal example, when a user clicks the 'Delete Item' button, then the focus should be transferred to the first focusable element in the dialog"
The BDD-style (behavior-driven development) syntax lends well to the nesting of actions performed on a example (also called "story" borrowed from Storybook nomenclature). The wording used inside the BDD blocks should read in plain English and be understandable by non-developers. The implementation should be as readable as possible by non-developers for trust and verification. The intent of the wording of the BDD-style syntax is to parse into an outline form and go into documentation as a verification of what specifications are covered.
Example:
describe('Given the Default Modal is rendered', () => {
beforeEach(() => {
// render the default modal example
});
context('when the target button is clicked', () => {
beforeEach(() => {
// click the target button
});
it('should open the modal', () => {
// expect modal to be open
});
});
});
The outline would look like:
- Given the Default Modal is rendered
- when the target button is clicked
- then it should open the modal
- when the target button is clicked
Tests should rely on ARIA attributes as much as possible for selecting and asserting. Test IDs should be used only to disambiguate DOM elements.
Read Intro to Cypress for an intro to how Cypress works. The core concept to Cypress is enqueued command chains that typically wrap jQuery collections. Cypress uses jQuery, Mocha, Chai, Sinon, Lodash and Moment. All are exposed to the developer of the test.
Let's break that down. Cypress has a jQuery-like chainable API to select and interact with DOM nodes. Cypress controls when and how often a command is run to ensure assertions either pass or time out. This is most confusing to those who approach Cypress commands as Promises. Commands are not Promises.
For example, imagine the following Cypress code:
cy.get('body')
.contains('button', 'Delete Item')
.click();
cy.get('[data-testid="MyResult"]').should('contain', 'Success!');
The Cypress Command queue will show ['get', 'contains', 'click', 'get', 'should']
, but it will not
run these commands immediately. There is also no need to use .then
to wait for any previous
condition to be met first. The Cypress run-time will do the following:
- Run
$('body')
every 50ms until$('body')
returns a non-empty jQuery collection. - Run a function that gets all
button
elements inside of thebody
element that have text content that contains 'Delete Item' and that jQuery collection is not empty. - Ensure the previous element is clickable (visible, not covered, has actual width/height), waiting 50ms to detect animations (waiting for animations to complete), etc. Basically, make sure a user could actually click on it.
- Run
$('[data-testid="MyResult"]')
every 50ms until it returns a non-empty jQuery collection. Usually clicking has side-effects like network requests to the server before the side-effect is reflected in the DOM. Since Cypress implicitly retries, this command will eventually return a non-empty jQuery collection and the author doesn't need to explicitly do anything. Note: this selector could return a non-empty jQuery collection instantly, but it might not contain "Success!" yet... - Run internal
chai
+chai-jquery
matchers (expect($el).to.contain('Success!')
) until an error is no longer thrown..should
will re-run the previous command and the assertion until the condition is met or a timeout occurs. This is the secret sauce to the stability of Cypress tests compared to WebDriver..should
can also receive a function that will be run until no error is thrown. If an error is never thrown, it will only be run once.
Cypress is very easy to get started with, but there isn't enough documentation about how to scale a complicated suite of Cypress tests.
Component helpers should be written to help with automation. These component helpers are useful for a more readable implementation as well as more maintainable test code. Component helpers can also be exported by this repository for use in downstream testing (e.g. Cypress tests in applications that use Canvas Kit). Helpers contain implementation details about how components are composed internally. The goal is that if the component's implementation changes, the component helpers should be the only code that needs to change to keep tests passing.
Component helpers come in 3 flavors:
- Starter: This helper type starts a Cypress chain. The return type is always
Cypress.Chainable<JQuery>
and usually contains acy.get
and the selector usually contains something that is unique to that component type. For example, the Modal helper uses[role=dialog]
with an option to disambiguate with adata-testid
. - Scoping: Imagine a web application is a collection of boxes that determine how components of an application are laid out. A scoping helper takes in a larger box and returns a smaller box, further reducing the scope. Ideally these helpers contain no Cypress commands (which are asynchronous), only jQuery calls (which are synchronous). Synchronous helper functions can be run more than once at the discretion of the Cypress runtime. A good use-case for this type is to replace CSS selectors.
- Action: Cypress comes with many baked in action commands such as
click
andtype
. A custom action is useful to give a higher level semantic meaning to a series of low-level commands. For example, a Select component might compose a Popup and a virtualized menu. This action may include opening the select, finding the popup element, seeing if the desired item is currently visible, scrolling the virtualized menu until the desired item is visible and clicking on the item. This code would be repeated for every instance and has implementation knowledge of the component. A high-level semantic action might be called "select" that wraps up this implementation detail to free up automation and development engineers to think about the test case and not about implementation details.
Canvas Kit uses ChromaticQA for visual tests which are run on stories that are opted-in through a special parameter. All the visual states should be represented at least one story. This way all of the visual states of a component can be visually regression tested without requiring the test to interact with the UI. These states should include loading states, error states, etc. Stories created for visual regression tests should avoid dynamic solutions like knobs.
To make a story runnable in Chromatic for visual testing, add the following to the story's parameter list:
// default Storybook API
storiesOf('Some Category', module)
.add('My Visual Story', () => {
/** story contents */
})
.addParameter({
chromatic: {
disable: false,
},
});
// CSF - All stories in file
export default withSnapshotsEnabled({
// CSF details (title, component, etc.)
});
// or CSF - Specific story
export const MyVisualStory = () => {
// story contents
};
withSnapshotsEnabled(MyVisualStory);
Not all visual states are application states (E.g. focus
, hover
, and active
on a button
element). For these pseudo-selectors, the StaticStates
wrapper can be used where pseudo-selectors
are turned into CSS class names instead (E.g. :focus
turns into .focus
) and a class name can be
added to the component. Note: this turns off the psuedo-selectors entirely.
The following will render a button where the focus styling is always applied:
export const States = () => {
<StaticStates>
<SecondaryButton className="focus">My Button</SecondaryButton>
</StaticStates>;
};
Note: This will only work correctly if the component passes on extra attributes to a container that contains pseudo-selector code (most components do this). If a pseudo-selector is defined on an element inside the container, this won't work.
StaticStates
allows for visual styling, like focus rings, to be tested automatically.
Some components may be harder to test statically due to animations or require user interaction normally. For example, Toasts animate in and Tooltips require a user to mouse-over the target element. Components should be written in a way that allows these animations to be disabled.
- Unit tests should cover APIs exposed to developers
- Functional tests should be written as a series of specifications of behavior and accessibility
- Visual tests should cover all visually representable states of components