lamindb.Schema¶
- class lamindb.Schema(features: list[SQLRecord] | list[tuple[Feature, dict]] | None = None, index: Feature | None = None, slots: dict[str, Schema] | None = None, name: str | None = None, description: str | None = None, itype: str | Registry | FieldAttr | None = None, flexible: bool | None = None, type: Schema | None = None, is_type: bool = False, otype: str | None = None, dtype: str | Type[int | float | str] | None = None, ordered_set: bool = False, minimal_set: bool = True, maximal_set: bool = False, coerce_dtype: bool = False, n: int | None = None)¶
Bases:
SQLRecord
,CanCurate
,TracksRun
Schemas of a dataset such as the set of columns of a
DataFrame
.Composite schemas can have multiple slots, e.g., for an
AnnData
, one schema for slotobs
and another one forvar
.- Parameters:
features –
list[SQLRecord] | list[tuple[Feature, dict]] | None = None
Feature records, e.g.,[Feature(...), Feature(...)]
or Features with their config, e.g.,[Feature(...).with_config(optional=True)]
.index –
Feature | None = None
AFeature
record to validate an index of aDataFrame
and therefore also, e.g.,AnnData
obs and var indices.slots –
dict[str, Schema] | None = None
A dictionary mapping slot names toSchema
objects.name –
str | None = None
Name of the Schema.description –
str | None = None
Description of the Schema.flexible –
bool | None = None
Whether to include any feature of the sameitype
in validation and annotation. If no Features are passed, defaults toTrue
, otherwise toFalse
. This means that if you explicitly pass Features, any additional Features will be disregarded during validation & annotation.type –
Schema | None = None
Type of Schema to group measurements by. Define types likeln.Schema(name="ProteinPanel", is_type=True)
.is_type –
bool = False
Whether the Schema is a Type.itype –
str | None = None
The feature identifier type (e.g.Feature
,Gene
, …).otype –
str | None = None
An object type to define the structure of a composite schema (e.g., DataFrame, AnnData).dtype –
str | None = None
The simple type (e.g., “num”, “float”, “int”). Defaults toNone
for sets ofFeature
records and to"num"
(e.g., for sets ofGene
) otherwise.minimal_set –
bool = True
Whether all passed Features are required by default. Seeoptionals
for more-fine-grained control.maximal_set –
bool = False
Whether additional Features are allowed.ordered_set –
bool = False
Whether Features are required to be ordered.coerce_dtype –
bool = False
When True, attempts to coerce values to the specified dtype during validation, seecoerce_dtype
.
See also
from_df()
Validate & annotate a
DataFrame
with a schema.from_anndata()
Validate & annotate an
AnnData
with a schema.from_mudata()
Validate & annotate an
MuData
with a schema.from_spatialdata()
Validate & annotate a
SpatialData
with a schema.
Examples
The typical way to create a schema:
import lamindb as ln import bionty as bt import pandas as pd # a schema with a single required feature schema = ln.Schema( features=[ ln.Feature(name="required_feature", dtype=str).save(), ], ).save() # a schema that constrains feature identifiers to be a valid ensembl gene ids or feature names schema = ln.Schema(itype=bt.Gene.ensembl_gene_id) schema = ln.Schema(itype=ln.Feature) # is equivalent to itype=ln.Feature.name # a schema that requires a single feature but also validates & annotates any additional features with valid feature names schema = ln.Schema( features=[ ln.Feature(name="required_feature", dtype=str).save(), ], itype=ln.Schema(itype=ln.Feature), flexible=True, ).save()
Passing options to the
Schema
constructor:# also validate the index schema = ln.Schema( features=[ ln.Feature(name="required_feature", dtype=str).save(), ], index=ln.Feature(name="sample", dtype=ln.ULabel).save(), ).save() # mark a single feature as optional and ignore other features of the same identifier type schema = ln.Schema( features=[ ln.Feature(name="required_feature", dtype=str).save(), ln.Feature(name="feature2", dtype=int).save().with_config(optional=True), ], ).save()
Alternative constructors (
from_values()
,from_df()
):# parse & validate identifier values schema = ln.Schema.from_values( adata.var["ensemble_id"], field=bt.Gene.ensembl_gene_id, organism="mouse", ).save() # from a dataframe df = pd.DataFrame({"feat1": [1, 2], "feat2": [3.1, 4.2], "feat3": ["cond1", "cond2"]}) schema = ln.Schema.from_df(df)
Attributes¶
- DoesNotExist = <class 'lamindb.models.schema.Schema.DoesNotExist'>¶
- Meta = <class 'lamindb.models.sqlrecord.SQLRecord.Meta'>¶
- MultipleObjectsReturned = <class 'lamindb.models.schema.Schema.MultipleObjectsReturned'>¶
- artifacts: Artifact¶
The artifacts that measure a feature set that matches this schema.
- branch: int¶
Whether record is on a branch or in another “special state”.
This dictates where a record appears in exploration, queries & searches, whether a record can be edited, and whether a record acts as a template.
Branch name coding is handled through LaminHub. “Special state” coding is as defined below.
One should note that there is no “main” branch as in git, but that all five special codes (-1, 0, 1, 2, 3) act as sub-specfications for what git would call the main branch. This also means that for records that live on a branch only the “default state” exists. E.g., one can only turn a record into a template, lock it, archive it, or trash it once it’s merged onto the main branch.
3: template (hidden in queries & searches)
2: locked (same as default, but locked for edits except for space admins)
1: default (visible in queries & searches)
0: archive (hidden, meant to be kept, locked for edits for everyone)
-1: trash (hidden, scheduled for deletion)
An integer higher than >3 codes a branch that can be used for collaborators to create drafts that can be merged onto the main branch in an experience akin to a Pull Request. The mapping onto a semantic branch name is handled through LaminHub.
- branch_id¶
- cell_markers¶
Accessor to the related objects manager on the forward and reverse sides of a many-to-many relation.
In the example:
class Pizza(Model): toppings = ManyToManyField(Topping, related_name='pizzas')
Pizza.toppings
andTopping.pizzas
areManyToManyDescriptor
instances.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- property coerce_dtype: bool¶
Whether dtypes should be coerced during validation.
For example, a
objects
-dtyped pandas column can be coerced tocategorical
and would pass validation if this is true.
- components: Schema¶
Components of this schema.
- composites: Schema¶
The composite schemas that contains this schema as a component.
For example, an
AnnData
composes multiple schemas:var[DataFrameT]
,obs[DataFrame]
,obsm[Array]
,uns[dict]
, etc.
- created_by: User¶
Creator of record.
- created_by_id¶
- features: Feature¶
The features contained in the schema.
- property flexible: bool¶
Indicates how to handle validation and annotation in case features are not defined.
Examples
Make a rigid schema flexible:
schema = ln.Schema.get(name="my_schema") schema.flexible = True schema.save()
During schema creation:
# if you're not passing features but just defining the itype, defaults to flexible = True schema = ln.Schema(itype=ln.Feature).save() assert not schema.flexible # if you're passing features, defaults to flexible = False schema = ln.Schema( features=[ln.Feature(name="my_required_feature", dtype=int).save()], ) assert not schema.flexible # you can also validate & annotate features in addition to those that you're explicitly defining: schema = ln.Schema( features=[ln.Feature(name="my_required_feature", dtype=int).save()], flexible=True, ) assert schema.flexible
- genes¶
Accessor to the related objects manager on the forward and reverse sides of a many-to-many relation.
In the example:
class Pizza(Model): toppings = ManyToManyField(Topping, related_name='pizzas')
Pizza.toppings
andTopping.pizzas
areManyToManyDescriptor
instances.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- property index: None | Feature¶
The feature configured to act as index.
To unset it, set
schema.index
toNone
.
- instances: Schema¶
Instances of this type.
- links_cellmarker¶
Accessor to the related objects manager on the reverse side of a many-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Parent.children
is aReverseManyToOneDescriptor
instance.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- links_component¶
Accessor to the related objects manager on the reverse side of a many-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Parent.children
is aReverseManyToOneDescriptor
instance.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- links_composite¶
Accessor to the related objects manager on the reverse side of a many-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Parent.children
is aReverseManyToOneDescriptor
instance.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- links_feature¶
Accessor to the related objects manager on the reverse side of a many-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Parent.children
is aReverseManyToOneDescriptor
instance.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- links_gene¶
Accessor to the related objects manager on the reverse side of a many-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Parent.children
is aReverseManyToOneDescriptor
instance.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- links_pathway¶
Accessor to the related objects manager on the reverse side of a many-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Parent.children
is aReverseManyToOneDescriptor
instance.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- links_project¶
Accessor to the related objects manager on the reverse side of a many-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Parent.children
is aReverseManyToOneDescriptor
instance.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- links_protein¶
Accessor to the related objects manager on the reverse side of a many-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Parent.children
is aReverseManyToOneDescriptor
instance.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- property members: QuerySet¶
A queryset for the individual records in the feature set underlying the schema.
Unlike
schema.features
,schema.genes
,schema.proteins
, etc., this queryset is ordered and doesn’t require knowledge of the entity.
- objects = <lamindb.models.query_manager.QueryManager object>¶
- property optionals: SchemaOptionals¶
Manage optional features.
Example
# a schema with optional "sample_name" schema_optional_sample_name = ln.Schema( features=[ ln.Feature(name="sample_id", dtype=str).save(), # required ln.Feature(name="sample_name", dtype=str).save().with_config(optional=True), # optional ], ).save() # raise ValidationError since `sample_id` is required ln.curators.DataFrameCurator( pd.DataFrame( { "sample_name": ["Sample 1", "Sample 2"], } ), schema=schema_optional_sample_name).validate() ) # passes because an optional column is missing ln.curators.DataFrameCurator( pd.DataFrame( { "sample_id": ["sample1", "sample2"], } ), schema=schema_optional_sample_name).validate() )
- pathways¶
Accessor to the related objects manager on the forward and reverse sides of a many-to-many relation.
In the example:
class Pizza(Model): toppings = ManyToManyField(Topping, related_name='pizzas')
Pizza.toppings
andTopping.pizzas
areManyToManyDescriptor
instances.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- property pk¶
- projects: Project¶
Linked projects.
- proteins¶
Accessor to the related objects manager on the forward and reverse sides of a many-to-many relation.
In the example:
class Pizza(Model): toppings = ManyToManyField(Topping, related_name='pizzas')
Pizza.toppings
andTopping.pizzas
areManyToManyDescriptor
instances.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- run: Run | None¶
Run that created record.
- run_id¶
- sheets¶
Accessor to the related objects manager on the reverse side of a many-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Parent.children
is aReverseManyToOneDescriptor
instance.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- property slots: dict[str, Schema]¶
Slots.
Examples
# define composite schema anndata_schema = ln.Schema( name="small_dataset1_anndata_schema", otype="AnnData", slots={"obs": obs_schema, "var": var_schema}, ).save() # access slots anndata_schema.slots # {'obs': <Schema: obs_schema>, 'var': <Schema: var_schema>}
- space: Space¶
The space in which the record lives.
- space_id¶
- type: Schema | None¶
Type of schema.
Allows to group schemas by type, e.g., all meassurements evaluating gene expression vs. protein expression vs. multi modal.
You can define types via
ln.Schema(name="ProteinPanel", is_type=True)
.Here are a few more examples for type names:
'ExpressionPanel'
,'ProteinPanel'
,'Multimodal'
,'Metadata'
,'Embedding'
.
- type_id¶
Class methods¶
- classmethod from_df(df, field=FieldAttr(Feature.name), name=None, mute=False, organism=None, source=None)¶
Create schema for valid columns.
- Return type:
Schema
|None
- classmethod from_values(values, field=FieldAttr(Feature.name), type=None, name=None, mute=False, organism=None, source=None, raise_validation_error=True)¶
Create feature set for validated features.
- Parameters:
values (
list
[str
] |Series
|array
) – A list of values, like feature names or ids.field (
DeferredAttribute
, default:FieldAttr(Feature.name)
) – The field of a reference registry to map values.type (
str
|None
, default:None
) – The simple type. Defaults toNone
if reference registry isFeature
, defaults to"float"
otherwise.name (
str
|None
, default:None
) – A name.organism (
SQLRecord
|str
|None
, default:None
) – An organism to resolve gene mapping.source (
SQLRecord
|None
, default:None
) – A public ontology to resolve feature identifier mapping.raise_validation_error (
bool
, default:True
) – Whether to raise a validation error if some values are not valid.
- Raises:
ValidationError – If some values are not valid.
- Return type:
Example
import lamindb as ln import bionty as bt features = [ln.Feature(name=feat, dtype="str").save() for feat in ["feat11", "feat21"]] schema = ln.Schema.from_values(features) genes = ["ENSG00000139618", "ENSG00000198786"] schema = ln.Schema.from_values(features, bt.Gene.ensembl_gene_id, "float")
- classmethod inspect(values, field=None, *, mute=False, organism=None, source=None, from_source=True, strict_source=False)¶
Inspect if values are mappable to a field.
Being mappable means that an exact match exists.
- Parameters:
values (
list
[str
] |Series
|array
) – Values that will be checked against the field.field (
str
|DeferredAttribute
|None
, default:None
) – The field of values. Examples are'ontology_id'
to map against the source ID or'name'
to map against the ontologies field names.mute (
bool
, default:False
) – Whether to mute logging.organism (
str
|SQLRecord
|None
, default:None
) – An Organism name or record.source (
SQLRecord
|None
, default:None
) – Abionty.Source
record that specifies the version to inspect against.strict_source (
bool
, default:False
) – Determines the validation behavior against records in the registry. - IfFalse
, validation will include all records in the registry, ignoring the specified source. - IfTrue
, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.
- Return type:
See also
Example:
import bionty as bt # save some gene records bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save() # inspect gene symbols gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"] result = bt.Gene.inspect(gene_symbols, field=bt.Gene.symbol, organism="human") assert result.validated == ["A1CF", "A1BG"] assert result.non_validated == ["FANCD1", "FANCD20"]
- classmethod standardize(values, field=None, *, return_field=None, return_mapper=False, case_sensitive=False, mute=False, source_aware=True, keep='first', synonyms_field='synonyms', organism=None, source=None, strict_source=False)¶
Maps input synonyms to standardized names.
- Parameters:
values (
Iterable
) – Identifiers that will be standardized.field (
str
|DeferredAttribute
|None
, default:None
) – The field representing the standardized names.return_field (
str
|DeferredAttribute
|None
, default:None
) – The field to return. Defaults to field.return_mapper (
bool
, default:False
) – IfTrue
, returns{input_value: standardized_name}
.case_sensitive (
bool
, default:False
) – Whether the mapping is case sensitive.mute (
bool
, default:False
) – Whether to mute logging.source_aware (
bool
, default:True
) – Whether to standardize from public source. Defaults toTrue
for BioRecord registries.keep (
Literal
['first'
,'last'
,False
], default:'first'
) –When a synonym maps to multiple names, determines which duplicates to mark as
pd.DataFrame.duplicated
: -"first"
: returns the first mapped standardized name -"last"
: returns the last mapped standardized name -False
: returns all mapped standardized name.When
keep
isFalse
, the returned list of standardized names will contain nested lists in case of duplicates.When a field is converted into return_field, keep marks which matches to keep when multiple return_field values map to the same field value.
synonyms_field (
str
, default:'synonyms'
) – A field containing the concatenated synonyms.organism (
str
|SQLRecord
|None
, default:None
) – An Organism name or record.source (
SQLRecord
|None
, default:None
) – Abionty.Source
record that specifies the version to validate against.strict_source (
bool
, default:False
) – Determines the validation behavior against records in the registry. - IfFalse
, validation will include all records in the registry, ignoring the specified source. - IfTrue
, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.
- Return type:
list
[str
] |dict
[str
,str
]- Returns:
If
return_mapper
isFalse
– a list of standardized names. Otherwise, a dictionary of mapped values with mappable synonyms as keys and standardized names as values.
See also
add_synonym()
Add synonyms.
remove_synonym()
Remove synonyms.
Example:
import bionty as bt # save some gene records bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save() # standardize gene synonyms gene_synonyms = ["A1CF", "A1BG", "FANCD1", "FANCD20"] bt.Gene.standardize(gene_synonyms) #> ['A1CF', 'A1BG', 'BRCA2', 'FANCD20']
- classmethod validate(values, field=None, *, mute=False, organism=None, source=None, strict_source=False)¶
Validate values against existing values of a string field.
Note this is strict_source validation, only asserts exact matches.
- Parameters:
values (
list
[str
] |Series
|array
) – Values that will be validated against the field.field (
str
|DeferredAttribute
|None
, default:None
) – The field of values. Examples are'ontology_id'
to map against the source ID or'name'
to map against the ontologies field names.mute (
bool
, default:False
) – Whether to mute logging.organism (
str
|SQLRecord
|None
, default:None
) – An Organism name or record.source (
SQLRecord
|None
, default:None
) – Abionty.Source
record that specifies the version to validate against.strict_source (
bool
, default:False
) – Determines the validation behavior against records in the registry. - IfFalse
, validation will include all records in the registry, ignoring the specified source. - IfTrue
, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.
- Return type:
ndarray
- Returns:
A vector of booleans indicating if an element is validated.
See also
Example:
import bionty as bt bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save() gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"] bt.Gene.validate(gene_symbols, field=bt.Gene.symbol, organism="human") #> array([ True, True, False, False])
Methods¶
- add_synonym(synonym, force=False, save=None)¶
Add synonyms to a record.
- Parameters:
synonym (
str
|list
[str
] |Series
|array
) – The synonyms to add to the record.force (
bool
, default:False
) – Whether to add synonyms even if they are already synonyms of other records.save (
bool
|None
, default:None
) – Whether to save the record to the database.
See also
remove_synonym()
Remove synonyms.
Example:
import bionty as bt # save "T cell" record record = bt.CellType.from_source(name="T cell").save() record.synonyms #> "T-cell|T lymphocyte|T-lymphocyte" # add a synonym record.add_synonym("T cells") record.synonyms #> "T cells|T-cell|T-lymphocyte|T lymphocyte"
- async adelete(using=None, keep_parents=False)¶
- async arefresh_from_db(using=None, fields=None, from_queryset=None)¶
- async asave(*args, force_insert=False, force_update=False, using=None, update_fields=None)¶
- clean()¶
Hook for doing any extra model-wide validation after clean() has been called on every field by self.clean_fields. Any ValidationError raised by this method will not be associated with a particular field; it will have a special-case association with the field defined by NON_FIELD_ERRORS.
- clean_fields(exclude=None)¶
Clean all fields and raise a ValidationError containing a dict of all validation errors if any occur.
- date_error_message(lookup_type, field_name, unique_for)¶
- delete()¶
Delete.
- Return type:
None
- describe(return_str=False)¶
Describe schema.
- Return type:
None
|str
- get_constraints()¶
- get_deferred_fields()¶
Return a set containing names of deferred fields for this instance.
- prepare_database_save(field)¶
- refresh_from_db(using=None, fields=None, from_queryset=None)¶
Reload field values from the database.
By default, the reloading happens from the database this instance was loaded from, or by the read router if this instance wasn’t loaded from any database. The using parameter will override the default.
Fields can be used to specify which fields to reload. The fields should be an iterable of field attnames. If fields is None, then all non-deferred fields are reloaded.
When accessing deferred fields of an instance, the deferred loading of the field will call this method.
- remove_synonym(synonym)¶
Remove synonyms from a record.
- Parameters:
synonym (
str
|list
[str
] |Series
|array
) – The synonym values to remove.
See also
add_synonym()
Add synonyms
Example:
import bionty as bt # save "T cell" record record = bt.CellType.from_source(name="T cell").save() record.synonyms #> "T-cell|T lymphocyte|T-lymphocyte" # remove a synonym record.remove_synonym("T-cell") record.synonyms #> "T lymphocyte|T-lymphocyte"
- save_base(raw=False, force_insert=False, force_update=False, using=None, update_fields=None)¶
Handle the parts of saving which should be done only once per save, yet need to be done in raw saves, too. This includes some sanity checks and signal sending.
The ‘raw’ argument is telling save_base not to save any parent models and not to do any changes to the values before save. This is used by fixture loading.
- serializable_value(field_name)¶
Return the value of the field name for this instance. If the field is a foreign key, return the id value instead of the object. If there’s no Field object with this name on the model, return the model attribute’s value.
Used to serialize a field’s value (in the serializer, or form output, for example). Normally, you would just access the attribute directly and not use this method.
- set_abbr(value)¶
Set value for abbr field and add to synonyms.
- Parameters:
value (
str
) – A value for an abbreviation.
See also
Example:
import bionty as bt # save an experimental factor record scrna = bt.ExperimentalFactor.from_source(name="single-cell RNA sequencing").save() assert scrna.abbr is None assert scrna.synonyms == "single-cell RNA-seq|single-cell transcriptome sequencing|scRNA-seq|single cell RNA sequencing" # set abbreviation scrna.set_abbr("scRNA") assert scrna.abbr == "scRNA" # synonyms are updated assert scrna.synonyms == "scRNA|single-cell RNA-seq|single cell RNA sequencing|single-cell transcriptome sequencing|scRNA-seq"
- unique_error_message(model_class, unique_check)¶
- validate_constraints(exclude=None)¶
- validate_unique(exclude=None)¶
Check unique constraints on the model and raise ValidationError if any failed.