Transfer data

This guide shows how to transfer data from a source database instance into the current default database instance.

# !pip install 'lamindb[jupyter,aws,bionty]'
!lamin init --storage ./test-transfer --modules bionty
Hide code cell output
! using anonymous user (to identify, call: lamin login)
 initialized lamindb: anonymous/test-transfer
import lamindb as ln

ln.track("ITeOtm7bhtdq0000")
Hide code cell output
 connected lamindb: anonymous/test-transfer
 created Transform('ITeOtm7bhtdq0000'), started new Run('hLGFgBO3...') at 2025-01-12 14:04:00 UTC
 notebook imports: lamindb==1.0a2

Query all artifacts in the laminlabs/lamindata instance and filter them to their latest versions.

# query all latest artifact versions
artifacts = ln.Artifact.using("laminlabs/lamindata").filter(is_latest=True)

# convert the QuerySet to a DataFrame and show the latest 5 versions
artifacts.df().head()
Hide code cell output
! source modules has additional modules: {'wetlab', 'ourprojects'}
consider mounting these registry modules to transfer all metadata
uid key description suffix kind otype size hash n_files n_observations _hash_type _key_is_virtual _curator _overwrite_versions space_id storage_id version is_latest run_id created_at created_by_id aux _branch_code
id
607 sRapK07mMtToihzFeTaf None View Papalexi21 in Vitessce .vitessce.json None None 1527 jfAtjNNzdvetUaEo5zhf0Q NaN NaN md5 True None False 1 2 None True 141.0 2024-04-30 12:51:16.348884+00:00 2 None 1
726 HXJ4DDAw8012jVKwoxgd None View Kuppe2022 in Vitessce .vitessce.json None None 5258 JsVK8X8EGRsyTEMnD3Z-6g NaN NaN md5 True None False 1 2 None True 198.0 2024-06-26 10:35:31.697669+00:00 2 None 1
895 nbX7Pk0SAPHNlsQD0000 devdata/params_2024-09-30_11-44-22.json None .json None None 38084 s6viX7LZ6KsjWcXigAn0eg NaN NaN md5 True None False 1 2 None True NaN 2024-10-02 15:25:49.609268+00:00 9 None 1
815 XmeH4JgiJFha7Nl90000 schmidt22_perturbseq/schmidt22_perturbseq.h5ad schmidt22 perturbseq counts .h5ad None AnnData 20659936 MwfMo7FUjrdk5mzTHx9RMw NaN NaN md5-n False None False 1 2 None True 377.0 2024-06-18 09:26:45.885472+00:00 2 None 1
1010 dP0F1fEQWtorhDaI0000 example_datasets/small_dataset2.h5ad None .h5ad dataset AnnData 21224 7ok_2cIe73owydEGaj7m0A NaN 3.0 md5 True None False 1 2 None True 347.0 2024-11-25 14:59:38.945319+00:00 9 None 1

You can now further subset or search the QuerySet. Here we query by whether the description contains “tabula sapiens”.

artifact = artifacts.filter(description__contains="Tabula Sapiens").first()
artifact.describe()
Hide code cell output
! source modules has additional modules: {'wetlab', 'ourprojects'}
consider mounting these registry modules to transfer all metadata
Artifact .h5ad
├── General
│   ├── .uid = 'dPraor9rU1EofcFb6Wph'
│   ├── .key = 'tabula_sapiens_lung.h5ad'
│   ├── .size = 3899435772
│   ├── .hash = '8mB1KK2wd51F6HQdvqipcQ'
│   ├── .path = s3://lamindata/tabula_sapiens_lung.h5ad
│   ├── .created_by = Koncopd (Sergei Rybakov)
│   ├── .created_at = 2023-07-14 19:00:30
│   └── .transform = 'Ingest Tabula Sapiens Lung'
└── Labels
    └── .tissues                    bionty.Tissue              lung                                     
        .cell_types                 bionty.CellType            myofibroblast cell, B cell, capillary ae…
        .experimental_factors       bionty.ExperimentalFactor  anoxya, stroke                           
        .ulabels                    ULabel                     TSP1, TSP2, TSP14                        

By saving the artifact record that’s currently attached to the source database instance, you transfer it to the default database instance.

artifact.save()
Hide code cell output
/home/runner/work/lamindb/lamindb/lamindb/_record.py:635: FutureWarning: `name` will be removed soon, please pass 'Transfer from `laminlabs/lamindata`' to `description` instead
  transform = Transform(
 mapped records: Tissue(uid='7Tt4iEKc'), CellType(uid='5tiBvp96'), CellType(uid='7Crr32HI'), CellType(uid='6dzoXJ3Y'), CellType(uid='01NqvhnI'), CellType(uid='5NceZTYm'), CellType(uid='4PSMdO3I'), CellType(uid='3JO0EdVd'), CellType(uid='6rfrjhvo'), CellType(uid='37mWPv6o'), CellType(uid='5Z76sCep'), CellType(uid='2OWUH6Z1'), CellType(uid='5TU8SFt5'), CellType(uid='ryEtgi1y'), CellType(uid='1lMgAPE8'), CellType(uid='7m6Ruz32'), CellType(uid='42qbvc90'), CellType(uid='puGNwNrs'), CellType(uid='1T8bGe2I'), CellType(uid='6IC9NGJE'), CellType(uid='6ujMwy7s'), CellType(uid='3eecYgWR'), CellType(uid='zQ4dyjEs'), CellType(uid='7mNqzyFE'), CellType(uid='5A9EFjNB'), CellType(uid='3lsrLTv6'), CellType(uid='1HYtHpIc'), CellType(uid='6UmKFrzn'), CellType(uid='7eZArDpo'), CellType(uid='2KCFdGIk'), CellType(uid='1V5wVqK5'), CellType(uid='5i19XYug'), CellType(uid='2nPA0h4F'), CellType(uid='5Xi2OLvZ'), CellType(uid='3kaL3W1c'), ExperimentalFactor(uid='5YDCOg0V'), ExperimentalFactor(uid='7R1OhRJ7')
 transferred records: Artifact(uid='dPraor9rU1EofcFb6Wph'), Storage(uid='D9BilDV2'), CellType(uid='4mZaXZQg'), CellType(uid='5rVn0X39'), CellType(uid='EWy46Sey'), CellType(uid='4yqLzwwm'), ULabel(uid='vfLXaHgD'), ULabel(uid='gk6w8qC5'), ULabel(uid='tZCTk48f')
Artifact(uid='dPraor9rU1EofcFb6Wph', is_latest=True, key='tabula_sapiens_lung.h5ad', description='Part of Tabula Sapiens, a benchmark, first-draft human cell atlas.', suffix='.h5ad', size=3899435772, hash='8mB1KK2wd51F6HQdvqipcQ', space_id=1, storage_id=2, run_id=2, created_by_id=1, created_at=2025-01-12 14:04:07 UTC)
How do I know if a record is saved in the default database instance or not?

Every record has an attribute ._state.db which can take the following values:

  • None: the record has not yet been saved to any database

  • "default": the record is saved on the default database instance

  • "account/name": the record is saved on a non-default database instance referenced by account/name (e.g., laminlabs/lamindata)

The artifact record and all other feature & label records have been transferred to the current database.

artifact.describe()
Hide code cell output
Artifact .h5ad
├── General
│   ├── .uid = 'dPraor9rU1EofcFb6Wph'
│   ├── .key = 'tabula_sapiens_lung.h5ad'
│   ├── .size = 3899435772
│   ├── .hash = '8mB1KK2wd51F6HQdvqipcQ'
│   ├── .path = s3://lamindata/tabula_sapiens_lung.h5ad
│   ├── .created_by = anonymous
│   ├── .created_at = 2025-01-12 14:04:07
│   └── .transform = 'Transfer from `laminlabs/lamindata`'
└── Labels
    └── .tissues                    bionty.Tissue              lung                                     
        .cell_types                 bionty.CellType            pulmonary alveolar type 1 cell, adventit…
        .experimental_factors       bionty.ExperimentalFactor  anoxya, stroke                           
        .ulabels                    ULabel                     TSP1, TSP2, TSP14                        

You see that the data itself remained in the original storage location, which has been added to the current instance’s storage location as a read-only location.

ln.Storage.df()
Hide code cell output
uid root description type region instance_uid space_id run_id created_at created_by_id aux _branch_code
id
2 D9BilDV2 s3://lamindata None s3 us-east-1 4XIuR0tvaiXM 1 2.0 2025-01-12 14:04:07.738416+00:00 1 None 1
1 rn3r5XGzx5SY /home/runner/work/lamindb/lamindb/docs/test-tr... None local None 1FHu5eE0uxm4 1 NaN 2025-01-12 14:03:53.349593+00:00 1 None 1

See the state of the database.

ln.view()
Hide code cell output
****************
* module: core *
****************
Artifact
uid key description suffix kind otype size hash n_files n_observations _hash_type _key_is_virtual _curator _overwrite_versions space_id storage_id version is_latest run_id created_at created_by_id aux _branch_code
id
1 dPraor9rU1EofcFb6Wph tabula_sapiens_lung.h5ad Part of Tabula Sapiens, a benchmark, first-dra... .h5ad None None 3899435772 8mB1KK2wd51F6HQdvqipcQ None None sha1-fl False None False 1 2 None True 2 2025-01-12 14:04:07.740843+00:00 1 None 1
Run
uid name started_at finished_at reference reference_type _is_consecutive _status_code space_id transform_id report_id _logfile_id environment_id initiated_by_run_id created_at created_by_id aux _branch_code
id
1 hLGFgBO3g6c38sxU2IHo None 2025-01-12 14:04:00.694186+00:00 None None None None 0 1 1 None None None NaN 2025-01-12 14:04:00.694239+00:00 1 None 1
2 IXvSkUllssrRUGyGjxHX None 2025-01-12 14:04:07.595760+00:00 None None None None 0 1 2 None None None 1.0 2025-01-12 14:04:07.595839+00:00 1 None 1
Storage
uid root description type region instance_uid space_id run_id created_at created_by_id aux _branch_code
id
2 D9BilDV2 s3://lamindata None s3 us-east-1 4XIuR0tvaiXM 1 2.0 2025-01-12 14:04:07.738416+00:00 1 None 1
1 rn3r5XGzx5SY /home/runner/work/lamindb/lamindb/docs/test-tr... None local None 1FHu5eE0uxm4 1 NaN 2025-01-12 14:03:53.349593+00:00 1 None 1
Transform
uid key description type source_code hash reference reference_type space_id _template_id version is_latest created_at created_by_id aux _branch_code
id
2 4XIuR0tvaiXM0000 transfers/4XIuR0tvaiXM Transfer from `laminlabs/lamindata` function None None None None 1 None None True 2025-01-12 14:04:07.587213+00:00 1 None 1
1 ITeOtm7bhtdq0000 transfer.ipynb Transfer data notebook None None None None 1 None None True 2025-01-12 14:04:00.685331+00:00 1 None 1
ULabel
uid name is_concept description reference reference_type space_id run_id created_at created_by_id aux _branch_code
id
3 tZCTk48f TSP14 False None None None 1 2 2025-01-12 14:04:17.468316+00:00 1 None 1
2 gk6w8qC5 TSP2 False None None None 1 2 2025-01-12 14:04:17.328364+00:00 1 None 1
1 vfLXaHgD TSP1 False None None None 1 2 2025-01-12 14:04:17.185155+00:00 1 None 1
******************
* module: bionty *
******************
CellType
uid name ontology_id abbr synonyms description space_id source_id run_id created_at created_by_id aux _branch_code
id
111 4yqLzwwm bronchial vessel endothelial cell None None None None 1 NaN 2 2025-01-12 14:04:14.619260+00:00 1 None 1
110 EWy46Sey respiratory mucous cell None None None None 1 NaN 2 2025-01-12 14:04:14.200594+00:00 1 None 1
109 5rVn0X39 capillary aerocyte None None None None 1 NaN 2 2025-01-12 14:04:14.059474+00:00 1 None 1
108 4mZaXZQg alveolar fibroblast None None None None 1 NaN 2 2025-01-12 14:04:12.325028+00:00 1 None 1
107 3hXuCKYH perivascular cell CL:4033054 None None A Cell That Is Adjacent To A Vessel. A Perivas... 1 32.0 1 2025-01-12 14:04:11.679299+00:00 1 None 1
106 4qrbhCCl respiratory ciliated cell CL:4030034 None ciliated cell of the respiratory tract A Ciliated Cell Of The Respiratory System. Cil... 1 32.0 1 2025-01-12 14:04:11.679196+00:00 1 None 1
105 2aMXs0ko microvascular endothelial cell CL:2000008 None None Any Blood Vessel Endothelial Cell That Is Part... 1 32.0 1 2025-01-12 14:04:11.675661+00:00 1 None 1
ExperimentalFactor
uid name ontology_id abbr synonyms description molecule instrument measurement space_id source_id run_id created_at created_by_id aux _branch_code
id
8 1was9kRO hypoxia EFO:0009444 None None A Decrease In The Amount Of Oxygen In The Body... None None None 1 67 1 2025-01-12 14:04:16.618092+00:00 1 None 1
7 2lctIHmn central nervous system disease EFO:0009386 None central nervous system disorder|central nervou... A Disease Involving The Central Nervous System. None None None 1 67 1 2025-01-12 14:04:16.617998+00:00 1 None 1
6 68LLeA7O brain disease EFO:0005774 None disorder of brain|disease or disorder of brain... A Disease Affecting The Brain Or Part Of The B... None None None 1 67 1 2025-01-12 14:04:16.617909+00:00 1 None 1
5 2xDSpjH7 cerebrovascular disorder EFO:0003763 None Vascular Disorder, Intracranial|Cerebrovascula... A Disorder Resulting From Inadequate Blood Flo... None None None 1 67 1 2025-01-12 14:04:16.617815+00:00 1 None 1
4 6ISbvepx nervous system disease EFO:0000618 None nervous system disorder|neurologic disease|neu... A Non-Neoplastic Or Neoplastic Disorder That A... None None None 1 67 1 2025-01-12 14:04:16.617704+00:00 1 None 1
3 20Nq3k7b disease EFO:0000408 None disease or disorder|diseases|medical condition... A Disease Is A Disposition To Undergo Patholog... None None None 1 67 1 2025-01-12 14:04:16.617587+00:00 1 None 1
2 7R1OhRJ7 stroke EFO:0000712 None Cerebral Strokes|Acute Stroke|CVA (Cerebrovasc... A Sudden Loss Of Neurological Function Seconda... None None None 1 67 1 2025-01-12 14:04:15.968507+00:00 1 None 1
Source
uid entity organism name in_db currently_used description url md5 source_website space_id dataframe_artifact_id version run_id created_at created_by_id aux _branch_code
id
67 2a1H bionty.ExperimentalFactor all efo False True The Experimental Factor Ontology http://www.ebi.ac.uk/efo/releases/v3.70.0/efo.owl https://bioportal.bioontology.org/ontologies/EFO 1 None 3.70.0 None 2025-01-12 14:03:53.678206+00:00 1 None 1
32 3Uw2 bionty.CellType all cl False True Cell Ontology http://purl.obolibrary.org/obo/cl/releases/202... https://obophenotype.github.io/cell-ontology 1 None 2024-08-16 None 2025-01-12 14:03:53.671405+00:00 1 None 1
41 MUtA bionty.Tissue all uberon False True Uberon multi-species anatomy ontology http://purl.obolibrary.org/obo/uberon/releases... http://obophenotype.github.io/uberon 1 None 2024-08-07 None 2025-01-12 14:03:53.672225+00:00 1 None 1
103 5JnV BioSample all ncbi False True NCBI BioSample attributes s3://bionty-assets/df_all__ncbi__2023-09__BioS... 918db9bd1734b97c596c67d9654a4126 https://www.ncbi.nlm.nih.gov/biosample/docs/at... 1 None 2023-09 None 2025-01-12 14:03:53.681189+00:00 1 None 1
102 MJRq bionty.Ethnicity human hancestro False True Human Ancestry Ontology https://github.com/EBISPOT/hancestro/raw/3.0/h... 76dd9efda9c2abd4bc32fc57c0b755dd https://github.com/EBISPOT/hancestro 1 None 3.0 None 2025-01-12 14:03:53.681107+00:00 1 None 1
101 6vJm bionty.DevelopmentalStage mouse mmusdv False False Mouse Developmental Stages http://aber-owl.net/media/ontologies/MMUSDV/9/... 5bef72395d853c7f65450e6c2a1fc653 https://github.com/obophenotype/developmental-... 1 None 2020-03-10 None 2025-01-12 14:03:53.681022+00:00 1 None 1
100 10va bionty.DevelopmentalStage mouse mmusdv False True Mouse Developmental Stages https://github.com/obophenotype/developmental-... https://github.com/obophenotype/developmental-... 1 None 2024-05-28 None 2025-01-12 14:03:53.680940+00:00 1 None 1
Tissue
uid name ontology_id abbr synonyms description space_id source_id run_id created_at created_by_id aux _branch_code
id
23 kkib4Wcs lateral structure UBERON:0015212 None None Any Structure That Is Placed On One Side Of Th... 1 41 1 2025-01-12 14:04:10.073448+00:00 1 None 1
22 4QeoxdKp body proper UBERON:0013702 None None The Region Of The Organism Associated With The... 1 41 1 2025-01-12 14:04:10.073370+00:00 1 None 1
21 3XuRxEhw main body axis UBERON:0013701 None None A Principle Subdivision Of An Organism That In... 1 41 1 2025-01-12 14:04:10.073291+00:00 1 None 1
20 7ZCdHnvN subdivision of organism along main body axis UBERON:0011676 None axial subdivision of organism A Major Subdivision Of An Organism That Divide... 1 41 1 2025-01-12 14:04:10.073213+00:00 1 None 1
19 4o2HviGe multicellular anatomical structure UBERON:0010000 None multicellular structure An Anatomical Structure That Has More Than One... 1 41 1 2025-01-12 14:04:10.073131+00:00 1 None 1
18 31GPuSXP subdivision of trunk UBERON:0009569 None trunk subdivision|region of trunk None 1 41 1 2025-01-12 14:04:10.073054+00:00 1 None 1
17 4IV77xkH thoracic segment organ UBERON:0005181 None None An Organ That Part Of The Thoracic Segment Reg... 1 41 1 2025-01-12 14:04:10.072976+00:00 1 None 1

View lineage:

artifact.view_lineage()
_images/f9432a4e86504eab47a3f8e9c81aae92e9ac6db4923066f8fc61764336310c08.svg

The transferred dataset is linked to a special type of transform that stores the slug and uid of the source instance:

artifact.transform.description
'Transfer from `laminlabs/lamindata`'

The transform key has shape f"transfers/{source_instance.uid}":

artifact.transform.key
'transfers/4XIuR0tvaiXM'

The current notebook run is linked as the initiated_by_run of the “transfer run”:

artifact.run.initiated_by_run.transform
Transform(uid='ITeOtm7bhtdq0000', is_latest=True, key='transfer.ipynb', description='Transfer data', type='notebook', space_id=1, created_by_id=1, created_at=2025-01-12 14:04:00 UTC)
Hide code cell content
# test the last 3 cells here
# TODO restore the following test
# assert artifact.transform.description == "Transfer from `laminlabs/lamindata`"
# assert artifact.transform.key == "transfers/4XIuR0tvaiXM"
# assert artifact.transform.uid == "4XIuR0tvaiXM0000"
# assert artifact.run.initiated_by_run.transform.description == "Transfer data"

# clean up test instance
!lamin delete --force test-transfer
! calling anonymously, will miss private instances
 deleting instance anonymous/test-transfer