lamindb.Transform¶
- class lamindb.Transform(name: str, key: str | None = None, type: TransformType | None = None, revises: Transform | None = None)¶
Bases:
SQLRecord
,IsVersioned
Data transformations such as scripts, notebooks, functions, or pipelines.
A “transform” can refer to a Python function, a script, a notebook, or a pipeline. If you execute a transform, you generate a run (
Run
). A run has inputs and outputs.A pipeline is typically created with a workflow tool (Nextflow, Snakemake, Prefect, Flyte, MetaFlow, redun, Airflow, …) and stored in a versioned repository.
Transforms are versioned so that a given transform version maps on a given source code version.
Can I sync transforms to git?
If you switch on
sync_git_repo
a script-like transform is synched to its hashed state in a git repository upon callingln.track()
.>>> ln.settings.sync_git_repo = "https://github.com/laminlabs/lamindb" >>> ln.track()
The definition of transforms and runs is consistent the OpenLineage specification where a
Transform
record would be called a “job” and aRun
record a “run”.- Parameters:
name –
str
A name or title.key –
str | None = None
A short name or path-like semantic key.type –
TransformType | None = "pipeline"
SeeTransformType
.revises –
Transform | None = None
An old version of the transform.
Notes
Examples
Create a transform for a pipeline:
>>> transform = ln.Transform(key="Cell Ranger", version="7.2.0", type="pipeline").save()
Create a transform from a notebook:
>>> ln.track()
View predecessors of a transform:
>>> transform.view_lineage()
Attributes¶
- DoesNotExist = <class 'lamindb.models.transform.Transform.DoesNotExist'>¶
- Meta = <class 'lamindb.models.sqlrecord.SQLRecord.Meta'>¶
- MultipleObjectsReturned = <class 'lamindb.models.transform.Transform.MultipleObjectsReturned'>¶
- branch: int¶
Whether record is on a branch or in another “special state”.
This dictates where a record appears in exploration, queries & searches, whether a record can be edited, and whether a record acts as a template.
Branch name coding is handled through LaminHub. “Special state” coding is as defined below.
One should note that there is no “main” branch as in git, but that all five special codes (-1, 0, 1, 2, 3) act as sub-specfications for what git would call the main branch. This also means that for records that live on a branch only the “default state” exists. E.g., one can only turn a record into a template, lock it, archive it, or trash it once it’s merged onto the main branch.
3: template (hidden in queries & searches)
2: locked (same as default, but locked for edits except for space admins)
1: default (visible in queries & searches)
0: archive (hidden, meant to be kept, locked for edits for everyone)
-1: trash (hidden, scheduled for deletion)
An integer higher than >3 codes a branch that can be used for collaborators to create drafts that can be merged onto the main branch in an experience akin to a Pull Request. The mapping onto a semantic branch name is handled through LaminHub.
- branch_id¶
- created_by: User¶
Creator of record.
- created_by_id¶
- links_project¶
Accessor to the related objects manager on the reverse side of a many-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Parent.children
is aReverseManyToOneDescriptor
instance.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- links_reference¶
Accessor to the related objects manager on the reverse side of a many-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Parent.children
is aReverseManyToOneDescriptor
instance.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- links_ulabel¶
Accessor to the related objects manager on the reverse side of a many-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Parent.children
is aReverseManyToOneDescriptor
instance.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
- property name: str¶
Name of the transform.
Splits
key
on/
and returns the last element.
- objects = <lamindb.models.query_manager.QueryManager object>¶
- property pk¶
- predecessors: Transform¶
Preceding transforms.
These are auto-populated whenever an artifact or collection serves as a run input, e.g.,
artifact.run
andartifact.transform
get populated & saved.The table provides a more convenient method to query for the predecessors that bypasses querying the
Run
.It also allows to manually add predecessors whose outputs are not tracked in a run.
- projects: Project¶
Linked projects.
- references: Reference¶
Linked references.
- runs: Run¶
Runs of this transform.
- space: Space¶
The space in which the record lives.
- space_id¶
- property stem_uid: str¶
Universal id characterizing the version family.
The full uid of a record is obtained via concatenating the stem uid and version information:
stem_uid = random_base62(n_char) # a random base62 sequence of length 12 (transform) or 16 (artifact, collection) version_uid = "0000" # an auto-incrementing 4-digit base62 number uid = f"{stem_uid}{version_uid}" # concatenate the stem_uid & version_uid
- successors: Transform¶
Subsequent transforms.
See
predecessors
.
- ulabels: ULabel¶
ULabel annotations of this transform.
Methods¶
- async adelete(using=None, keep_parents=False)¶
- async arefresh_from_db(using=None, fields=None, from_queryset=None)¶
- async asave(*args, force_insert=False, force_update=False, using=None, update_fields=None)¶
- clean()¶
Hook for doing any extra model-wide validation after clean() has been called on every field by self.clean_fields. Any ValidationError raised by this method will not be associated with a particular field; it will have a special-case association with the field defined by NON_FIELD_ERRORS.
- clean_fields(exclude=None)¶
Clean all fields and raise a ValidationError containing a dict of all validation errors if any occur.
- date_error_message(lookup_type, field_name, unique_for)¶
- delete()¶
Delete.
- Return type:
None
- get_constraints()¶
- get_deferred_fields()¶
Return a set containing names of deferred fields for this instance.
- prepare_database_save(field)¶
- refresh_from_db(using=None, fields=None, from_queryset=None)¶
Reload field values from the database.
By default, the reloading happens from the database this instance was loaded from, or by the read router if this instance wasn’t loaded from any database. The using parameter will override the default.
Fields can be used to specify which fields to reload. The fields should be an iterable of field attnames. If fields is None, then all non-deferred fields are reloaded.
When accessing deferred fields of an instance, the deferred loading of the field will call this method.
- save_base(raw=False, force_insert=False, force_update=False, using=None, update_fields=None)¶
Handle the parts of saving which should be done only once per save, yet need to be done in raw saves, too. This includes some sanity checks and signal sending.
The ‘raw’ argument is telling save_base not to save any parent models and not to do any changes to the values before save. This is used by fixture loading.
- serializable_value(field_name)¶
Return the value of the field name for this instance. If the field is a foreign key, return the id value instead of the object. If there’s no Field object with this name on the model, return the model attribute’s value.
Used to serialize a field’s value (in the serializer, or form output, for example). Normally, you would just access the attribute directly and not use this method.
- unique_error_message(model_class, unique_check)¶
- validate_constraints(exclude=None)¶
- validate_unique(exclude=None)¶
Check unique constraints on the model and raise ValidationError if any failed.
- view_lineage(with_successors=False, distance=5)¶
View lineage of transforms.