Getting started

MicroHapDB is designed as a community resource requiring minimal infrastructure to use and maintain. No web interface is available for MicroHapDB. The recommended mode of accessing MicroHapDB is through a software package that comes bundled with the database contents. Instructions for installing the MicroHapDB package are provided here. Once installed, users can access the database contents in any of the following ways.

Command line interface

MicroHapDB provides a simple and self-documenting text interface for database query and retrieval. Using the microhapdb command in the terminal, a user can specify filtering criteria to select population, marker, and frequency data, and format this data in a variety of ways.

  • microhapdb population for retrieving information on populations for which MicroHapDB has allele frequency data

  • microhapdb marker for retrieving marker information

  • microhapdb frequency for retrieving microhap population frequencies

  • microhapdb lookup for retrieving individual records of any type

Reference documentation for these commands, including detailed instructions for configuring and running database queries, is available here. Alternatively, executing any of the commands above with the text --help (e.g. microhapdb marker --help) will print the same documentation to the terminal.

Python API

For users with programming experience, the contents of MicroHapDB can be accessed programmatically from the microhapdb Python package. Including import microhapdb in the header of a Python program will provide access to the following resources.

Database tables

MicroHapDB is comprised of two primary database tables. Each is stored in memory as a Pandas DataFrame object.

  • microhapdb.markers

  • microhapdb.frequencies

Additional auxiliary tables are also provided, including the following.

  • microhapdb.populations: contains descriptions of each population group for which frequency data is available

  • microhapdb.merged: contains a mapping of microhap identifiers that were merged during the database build process

  • microhapdb.variantmap: contains a mapping of dbSNP variants to their corresponding microhap markers

  • microhapdb.indels: contains variant information for markers that include insertion/deletion variants (DEPRECATED)

Convenience functions for data retrieval

The microhapdb.Marker and microhapdb.Population modules include functions for retrieving database records based on IDs, names, genomic position, and other attributes. Some functions return Marker or Population objects with various helpful attributes and methods.

>>> marker = microhapdb.Marker.from_id("mh02USC-2pC")
>>> print(marker)
mh02USC-2pC (chr2:79025823-79025903)
>>> for marker in microhapdb.Marker.from_ids(["mh06USC-6pB", "mh17USC-17qA", "mh02USC-2pA"]):
...   print(marker)
... 
mh02USC-2pA (chr2:10810991-10811070)
mh06USC-6pB (chr6:53836249-53836261)
mh17USC-17qA (chr17:27762200-27762288)
>>> for marker in microhapdb.Marker.from_region("chr11:25000000-50000000"):
...   print(marker)
... 
mh11ZBF-001 (chr11:27379841-27379905)
mh11PK-63643 (chr11:34415814-34415851)
mh11USC-11pB (chr11:34415816-34415837)
>>> panel = list(microhapdb.Marker.from_query("Source == '10.1007/s00414-020-02483-x'"))
>>> len(panel)
59
>>> population = microhapdb.Population.from_id("IBS")
>>> print(population)
IBS     Iberian Population in Spain     1KGP
>>> for population in microhapdb.Population.from_ids(["GBR", "FIN", "CEU"]):
...   print(population)
... 
GBR     British in England and Scotland 1KGP
FIN     Finnish in Finland      1KGP
CEU     Utah Residents (CEPH) with Northern and Western European Ancestry       1KGP
>>> for population in microhapdb.Population.from_query("Name.str.contains('Afr')"):
...   print(population)
... 
MHDBP-3dab7bdd14        Africa  10.1016/j.fsigen.2018.05.008
SA000101C       African Americans       ALFRED
ACB     African Caribbeans in Barbados  1KGP
ASW     Americans of African Ancestry in SW USA 1KGP

A similar set of functions return DataFrame objects with subsets of the primary database tables.

>>> microhapdb.Marker.table_from_ids(["mh06USC-6pB", "mh17USC-17qA", "mh02USC-2pA"])
             Name          PermID Reference  Chrom                              Offsets      Ae      In     Fst                        Source
48    mh02USC-2pA  MHDBM-1734fe04    GRCh38   chr2  10810991,10811035,10811042,10811069  2.7695  0.3702  0.2143  10.1016/j.fsigen.2019.102213
226   mh06USC-6pB  MHDBM-7cd89ff8    GRCh38   chr6           53836249,53836252,53836260  3.1711  0.1165  0.0948  10.1016/j.fsigen.2019.102213
534  mh17USC-17qA  MHDBM-cd7a9041    GRCh38  chr17  27762200,27762204,27762238,27762287  3.5538  0.0604 -0.0283  10.1016/j.fsigen.2019.102213
>>> microhapdb.Marker.table_from_region("chr11:25000000-50000000")
             Name          PermID Reference  Chrom                                            Offsets      Ae      In     Fst                        Source
368   mh11ZBF-001  MHDBM-6a26f27d    GRCh38  chr11                                  27379841,27379901  2.4158  0.0755  0.0262        10.1002/elps.201900451
369  mh11PK-63643  MHDBM-c5ce121f    GRCh38  chr11  34415814,34415816,34415818,34415835,34415836,3...     NaN     NaN     NaN  10.1016/j.fsigen.2018.05.008
370  mh11USC-11pB  MHDBM-2408c5b4    GRCh38  chr11                34415816,34415818,34415835,34415836  3.9841  0.1404  0.1346  10.1016/j.fsigen.2019.102213
>>> panel = microhapdb.Marker.table_from_query("Source == '10.1007/s00414-020-02483-x'")
>>> panel.shape
(59, 9)
>>> microhapdb.Population.table_from_ids(["GBR", "FIN", "CEU"])
      ID                                               Name Source
12   GBR                    British in England and Scotland   1KGP
26   FIN                                 Finnish in Finland   1KGP
103  CEU  Utah Residents (CEPH) with Northern and Wester...   1KGP
>>> microhapdb.Population.table_from_query("Name.str.contains('Afr')")
                 ID                                     Name                        Source
2  MHDBP-3dab7bdd14                                   Africa  10.1016/j.fsigen.2018.05.008
3         SA000101C                        African Americans                        ALFRED
4               ACB           African Caribbeans in Barbados                          1KGP
5               ASW  Americans of African Ancestry in SW USA                          1KGP

Direct file access

For users that have successfully installed MicroHapDB but do not want to access the database contents through the CLI or API, database tables can be accessed directly and loaded into Excel, R, Python, or any preferred data analysis environment. Running microhapdb --files on the command line will reveal the location of these files on the local system.

WARNING: Modifying the contents of the database files may cause problems with MicroHapDB. Any user wishing to sort, filter, or otherwise manipulate the contents of the core database files should instead make copies of those files and manipulate the copies.