Profiling Methodology

Overview

In practice, VA-Spec schema used to represent actual data are Profiles defined to constrain and/or extend core Statement, Study Result, and Evidence Line classes to support a specific type of variant knowledge.

The VA-Spec defines a Profiling Methodology which specifies the types of specializations and extensions that are permitted in authoring profiles, as illustrated in the diagram and detailed in the ‘Profiling Tasks` below.

_images/profiling-methodology.png

Examples of specializations defined in Variant Pathogenicity profiles.

(A) Core Proposition and Statement classes, showing a subset of their attributes. (B) ACMG-based Variant Pathogenicity profiles derived from these core classes, with profiling specializations in green. Text in curly braces are enumerations, which in some cases are nested inside fields of a MappableConcept. The actual VA-Spec v1.0 schema for these profiles are here and here.

Profiling Tasks

Profiling tasks supported by the VA-Spec, and illustrated in the example above, include:

Profiling Task

Example

Define domain-specific subtypes of general purpose Core Model classes

Specialization of Proposition into VariantPathogenicityProposition

Define new attributes to capture domain-specific information

The Statement qualifiers geneContextQualifier and alleleOriginQualifier

Define or import classes for domain entities that profiles are about

The VariantPathogenicityProposition profile uses MolecularVariation and CategoricalVariation classes imported from VRS and CatVRS, and a Condition class defined in the VA-Spec itself.

Constrain values of core attributes to take specific types as values

Restricting the VariantPathogenicityStatement.object field to take a Condition as its value

Define value sets and binding them to select attributes.

Restricting nested fields in the MappableConcept object taken by VariantPathogenicityStatement.classification to a set of enumerated values based on ACMG Guideline terminology.

Refine cardinality of select attributes

Making Statement.classification a required field in the ACMG Variant Pathogenicity Statement.

Profile Authoring

Version 1.0 of the VA-Spec relies on two distinct mechanisms for authoring different categories of Profiles.

Mechanism 1: Inheritance-Based Profiling (for authoring “Base” Profiles)

  • Description: Specializes generic VA core classes for a particular type of knowledge, through formal definition of concrete subclasses.

  • Mechanism: Relies on bespoke GKS Metaschema Processor inherits and extends functions, and requisite tooling, to implement class inheritance and attribute extension which are not natively supported by JSON Schema.

  • Application: Used in authoring “Base Profiles” for Propositions and Study Results, which can be used/referenced within Statement and Evidence Line profiles.

  • Rationale: Allows for the types of attribute extension and addition that are applied in these Base Profiles (e.g. to specialize Proposition subject and object attributes, and create specific Proposition qualifiers and StudyResult data items)

Inheritance-Based Profiling Example:

# From the source yaml file where the Variant Pathogenicity Proposition Base Profile is authored

VariantPathogenicityProposition:
  inherits: ClinicalVariantProposition           # MSP inherits keyword
  maturity: trial use
  type: object
  description: A proposition describing the role of a variant in causing a heritable condition.
  properties:
    objectCondition:
      extends: object                            # MSP extends keyword
      oneOf:
        - $ref: Condition
        - $refCurie: gks.core:iriReference
      description: The :ref:`Condition` for which the variant impact is stated.
    penetranceQualifier:                         # Addition of new qualifier attribute
      $refCurie: gks.core:MappableConcept
      description: Reports the penetrance of the pathogenic effect...

Mechanism 2: Composition-Based Profiling (for authoring “Community” Profiles)

  • Description: Defines subschema that layer additional constraints on top of VA core attributes to refine the values they are able to take.

  • Mechanism: Relies on schema composition using the native JSON Schema allOf keyword, which does not result in creation of concrete subclasses for each profile.

  • Application: Used in authoring “Community Profiles” that add guideline-specific constraints on core Statement and Evidence Line classes, which can leverage base Proposition profiles to represent semantics of the possible fact they assert or evaluate evidence against, respectively.

  • Rationale: Allows implementers to define simple constraints for Statement and Evidence Line profiles in a way that does not require running bespoke MSP tooling

Composition-Based Profiling Example:

# From the source yaml file where the Variant Pathogenicity Statement AMCG 2015 Community Profile is authored

VariantPathogenicityStatement:
  description: A Statement describing the role of a variant in causing an inherited condition.
  # JSON Schema 'allOf' keyword used for schema composition
  allOf:
  - $ref: "/ga4gh/schema/va-spec/1.0.0/base/json/Statement"
  # list of property definitions that further constrain attributes in the base Statement class
  - properties:
      # A constraint on the Statement.proposition attribute requiring it to take a VariantPathogenicityProposition
      proposition:
        $ref: "/ga4gh/schema/va-spec/1.0.0/base/json/VariantPathogenicityProposition"
        description: A proposition about the pathogenicity of a variant, the validity of which is assessed and reported by the Statement.
      # A constraint on the code field nested within a MappableConcept that requires the 'strength' attribute to take specific values.
      strength:
        description: The strength of support that an ACMG 2015 Variant Pathogenicity statement is determined to provide for or against the proposed pathogenicity of the assessed variant.
        properties:
          primaryCoding:
            code:
              enum:
                - definitive
                - likely
            system:
              const: ACMG Guidelines, 2015

Future Plans

We recognize that this approach involving different mechanisms and ad hoc tooling to support authoring different subsets of profiles is not ideal, but was adopted given available technologies and bandwidth at this point in development.

Future versions of the VA-Spec will adopt a single, coherent, and consistent technical approach and tooling support for profile authoring, which will likely leverage the LinkML Framework (in particular, LinkML Map).