Kernel Variant Jira Process

Kernel Variant Testing

The Challenge

How we test kernel and its growing list of variants is an ongoing issue. Holding up merge requests by awaiting test verification of not only kernel but an occasionally missed variant counterpart tracker that may even be untriaged is a pain point for developers. Awaiting verification of a late-discovered variant counterpart tracker can lead to delays, more context switching, and more time spent rebasing. However from a QE and customer-centric view, we cannot be releasing kernel variants untested in scenarios where a kernel patch may affect said variant, so it is imperative we maintain a procedure to track kernel variant testing.

This issue has made itself largely visible in how we handled kernel-rt. There are many scenarios where a kernel and RT bug are filed for a single issue and must be manually kept in sync. Some improvements have been made to kernel webhooks and automations to discover and gate kernel MRs on kernel-rt counterpart trackers. However, this discovery procedure is not perfect as it depends on trackers having correct fields set (summary, fix version, and so on). As we ramp up additional variants, this problem grows larger and manual efforts to manage it cannot scale.

Moving Forward

Managing kernel variants like RT in bugs completely separate from the kernel itself has led to pain points for all members of kernel subsystems. Therefore kernel variant testing will now be handled via linked RHEL Jira tickets of type "Task."

Variant Workflow Procedure

The following diagram shows how we will manage kernel and kernel variant test tasks in Jira (right-click to open full size image in new tab as needed):

In many cases where an issue comes in via the RHEL product (kernel), RHEL-RT product (kernel-rt), or RHIVOS product (kernel-automotive), a patch will be applied to the RHEL kernel to later be inherited by its respective kernel variants. In these cases, any issue or story that comes in, regardless of the originating kernel variant or product, will first be converted to a parent bug/story with component = kernel.

By doing this, we are declaring that the component of a parent bug or story reflects where the patch will be applied:

A bug or story with component = kernel reflects a patch being made to the main RHEL kernel tree
A bug or story with component = kernel-rt reflects an RT-specific patch being made only to the RT tree (pre- RHEL 9.3)
A bug or story with component = kernel-automotive reflects an automotive-specific patch being made only to the RHIVOS tree

The workflow outlined in this document is only applicable to a patch that must go into the kernel main tree. For a bug or feature where the patch only applies to an RT or automotive tree, please follow the standard RHEL development workflow for those components.

Labeling

A labeling system is utilized to create linked tasks to the parent kernel issue to indicate any kernel variant testing that needs to be performed.

The labels used in this process are:

KWF:kernel-rt - to track kernel-rt testing
KWF:kernel-64k - to track kernel-64k testing
KWF:kernel-automotive - to track kernel-automotive testing

Any future variant added for kernel will follow the same naming convention.

All testing of the RHEL kernel itself will still be tracked on the parent bug/story as usual, whereas kernel variant testing is now tracked tasks linked with an "is blocked by" relationship.

When a parent kernel issue has a linked variant task, KWF will block a kernel MR from merging until the parent ticket and all its linked variant tasks have completed test evaluation. That means:

The parent kernel issue has "Preliminary Testing: Pass" indicating the RHEL kernel has passed QE preliminary testing
Any linked variant tasks are "Closed" with a comment indicating either the test results or reason why testing was waived or not conducted

Automated Labeling

There are two cases where these KWF labels may be automatically added to a kernel issue:

If the kernel bug is a CVE, "KWF:kernel-rt" label will automatically be added to track kernel-rt test verification.
Once a kernel MR is filed, if files belonging to a kernel subsystem are altered and that subsystem has the testVariants list defined in owners.yaml, then the respective KWF label(s) will be added to the parent kernel ticket to spawn and link appropriate variant testing task(s)
- For example, testVariants: kernel-rt defined for a subsystem in owners.yaml means that all MRs affecting that subsystem will trigger the creation of a kernel-rt testing task

Manual Labeling

Developers and QE also reserve the right to manually label a kernel issue to spawn kernel variant testing tasks. The KWF:* set of labels can be used any time the user believes there to be an impact to said kernel variant and wants to ensure a testing evaluation will occur for that variant.

In particular, if an issue was originally filed against a variant like kernel-rt, but the patch must come in via the kernel, then the issue should be converted to a component = kernel parent issue and the "KWF:kernel-rt" label added so that a task can be spawned to track the testing of kernel-rt.

An added benefit of this component conversion is that the bug ID of the original issue becomes the bug ID that gets attached to advisories when addressed, allowing customers to continue tracking a fix against the bug ID they have originally filed and may have linked to a customer case.

Synchronized Fields

When a kernel variant task is linked to a parent kernel issue, they will inherit the following fields from the kernel issue:

Fix Version
Pool Team
Security Level
QA Contact
- Note: tasks do not have a "QA Contact" field, so a Task’s "Assignee" becomes the QA Contact of the parent kernel bug/story

All of these field synchronizations are one-way from the kernel to its linked dependents. That is to say, any change to the fields above in the kernel ticket will propagate the changes to its linked variant dependents. However, a change to any of these fields in the linked dependents will not sync back to the kernel parent issue.

Tasks have a limited set of fields:

Tasks do not have an ITM. The milestone set on the parent kernel issue then must reflect the estimated completion milestone for all work across kernel and its variants.
Tasks do not have a "Testable Builds" field. KWF will therefore post all testable builds for the kernel and its variants to the parent kernel issue.

QE Ownership

To ensure that kernel MRs are not gated for a prolonged period of time due to the need to synchronize the testing of kernel and any respective variants requiring testing for that MR, then it is requested that only one functional QE owns the kernel main issue and its respective variant tasks.

As discussed in the prior section, Jira automations will ensure that the QA Contact for the parent kernel issue is synchronized to any linked variant tasks as the Assignee.

The assigned QE is the expert in this area of functional expertise and must make the decisions as to what level of testing to perform for the kernel and its variants.

If the functional QE determines that testing needs to be performed on said kernel variant, then (s)he must perform this testing, comment the test results, and close the linked task with resolution "Done"
If the functional QE determines that no additional functional testing is required for said kernel variant, then (s)he will indicate this by adding such comment and closing the linked task as "Done"

RT

The functional QE owner of the kernel issue is responsible for all linked variant tasks, including RT.

For kernel-rt, CKI-RT testing will run against each kernel MR build, and internally the RHEL Real Time QE team runs a limited functional tier test suite against all kernel-rt candidate Brew builds. The functional QE owner of an RT variant task may deem additional testing for a bug as not required and use this fact as the basis for their decision to not execute any additional testing.

What remains important however is that the functional QE owner comments their test decision on every RT task they own and drives both the kernel and all linked variant tasks to completion.

Future Kernel Variants

It is up to both kernel developers and QE to ensure that appropriate levels of CI testing are added to both pre-merge and post-merge pipelines for said kernel variant to minimize the amount of manual testing required by the subsystem QE.

When Testing Fails

When QE runs testing and discovers a failure:

If the failure is on kernel, follow the standard RHEL development procedure of setting Preliminary Testing: Fail on the kernel bug. This will flag to both developers and kernel workflow automations that follow-up, and potentially a new patch, are required
If the failure is on a variant test task, leave the test task open and set Preliminary Testing: Fail on the parent kernel issue

When a new patch is applied to a kernel MR, kernel webhooks will:

Post new artifacts to "Testable Builds" on the parent kernel issue
Reset Preliminary Testing back to Requested on the parent kernel issue
Transition any linked variant tasks back to "In Progress"

This means that any time new patches are applied to an MR, QE must reevaluate not only the kernel but any linked variant task as well. While true that QE’s prior evaluation may have been that variant testing is not required, this workflow errs on the side of caution by requiring review and task closure again any time a patch for the MR is updated, as it is possible that new code was introduced that may now warrant such variant testing.

Post-Merge

Once a kernel MR merges, the parent kernel ticket continues following the standard RHEL development procedure for final verification and release.

For kernel-rt prior to RHEL 9.3 where it is built as a separate component and shipped in its own advisories, the parent kernel ticket will additionally be attached to the respective kernel-rt advisory. Developers and QE may therefore see two "Errata Link" entries on these pre-9.3 kernel bugs.

For any linked kernel variant tasks, testing has been completed pre-merge and is not strictly required again for final verification. Thanks to the simplified states of the Jira "Task" type, these may remain Closed. However, QE may still wish to conduct variant testing post-merge as well and may do so.

Closing Kernel Issues

In the event a kernel bug or story is closed with any resolution that is not "Done" (such as "Not a Bug," "Won’t Fix," etc), any existing linked variant tasks will be automatically closed as well by Jira automations, matching the resolution of the parent kernel ticket. Conversely, when a kernel ticket is reopened, any linked variant tasks will also be reopened.

A Note for CVEs

The variant workflow procedure applies for kernel CVEs addressed in RHEL as well.

For RHEL kernel maintainers, note that this procedure means there will no longer be separate kernel-rt CVE bugs to reference in either changelog or "Resolves:" line. Instead, the kernel CVE Jira ID shall be used for both. When a kernel CVE is for a pre-9.3 RHEL stream, there will be two Fixed-In Builds linked to the ticket: one for the kernel build, and one for the kernel-rt build. Internal automations will handle adding this kernel CVE to both kernel and kernel-rt advisories, and customer-facing CVE pages will continue to properly reflect the CVE fix status for both kernel and kernel-rt.

Example

Here is a hypothetical example of how the variant workflow would look like in Jira for a RHEL 8.10 stream kernel bug.

In this purely fictional example, a new bug for the kernel scheduler is discovered and a Jira bug is filed to address the issue in RHEL 8.10. A kernel developer from the scheduler subsystem comes up with a patch and files a merge request, which gets linked to the Jira. Since this merge request touches the scheduler subsystem, and owners.yaml has testVariants: kernel-rt defined for scheduler, the label "KWF:kernel-rt" is automatically applied to this bug within minutes of the MR filing.

This "KWF:kernel-rt" label causes a new kernel-rt test task to be spawned with the same summary as the kernel issue but pre-pended with "[KWF:kernel-rt]" and linked to the parent kernel issue with a blocking relationship. KWF will now block the MR on merging until both the parent kernel issue is marked with Preliminary Testing: Pass and linked kernel-rt variant task is transitioned to 'Closed'.

Once QE completes verification of both kernel and kernel-rt, and the MR receives all required approvals, the MR gets the green light to merge. As this is a stream prior to RHEL 9.3, a 'Fixed in Build' for both kernel and kernel-rt are added to this kernel bug, and subsequently the parent kernel bug is linked to both a kernel and kernel-rt advisory to ship to customers.

From that point, the parent kernel issue would move to Integration state. QE only needs to complete final verification now on this parent kernel tracker and transition its state to Release Pending.

Help

For any questions or concerns about this workflow, please reach out to #team-kernel-workflow on Red Hat internal Slack.