API

class msp2db.parse.LibraryData(msp_pth, db_pth=None, mslevel=None, polarity=None, source=u'unknown', db_type=u'sqlite', password=None, user=None, mysql_db_name=None, chunk=200, schema=u'mona', user_meta_regex=None, user_compound_regex=None, compound_lookup=True, celery_obj=False)[source]

MSP file parser to SQL databases

After creating a SQL database for the library spectra using create_db, MSP files can be parsed into the database using the LibraryData class.

Example

>>> from msp2db.db import create_db
>>> from msp2db.parse import LibraryData
>>> db_pth = 'spectral_library.db'
>>> create_db(file_pth=db_pth, db_type='sqlite', db_name='spectra')
>>> libdata = LibraryData(msp_pth='MoNA-export-FAHFA.msp',
>>>                  db_pth=db_pth,
>>>                  db_type='sqlite',
>>>                  schema='mona',
>>>                  source='fahfa',
>>>                  chunk=200)
Parameters:
  • msp_pth (str) – path to msp file or directory [required]
  • db_pth (str) – path to sqlite database (only required when using SQLite database) [default None]
  • source (str) – Source of the msp files (e.g. massbank) [default ‘unknown’]
  • mslevel (int) – If the msp file does not contain the mslevel this can be defined here [default None]
  • polarity (str) – If the msp file does not contain the polarity this can be defined here [default None]
  • db_type (str) – The type of database to submit to (either ‘sqlite’, ‘mysql’ or ‘django_mysql’) [default sqlite]
  • user (str) – Username for database (only required for non Django mysql databases) [default None]
  • password (str) – Password for database (only required for non Django mysql databases) [default None]
  • mysql_db_name (str) – Name of the mysql database (only required for non Django mysql databases) [default None]
  • chunk (int) – Chunks of spectra to parse data (useful to control memory usage) [default 200]
  • schema (str) – MSP files can vary based on how they were made, two standard schemas are available either ‘mona’ based on the MassBank of North America (MoNA) MSP files. And ‘massbank’ which is based on the more controlled MassBank MSP files https://github.com/MassBank/MassBank-data [default ‘mona’]
  • user_meta_regex (dict) – For other MSP files not derived from either MoNA or MassBank a custom dictionary of regexes can be used [default None]
  • user_compound_regex (dict) – For other MSP files not derived from either MoNA or MassBank a custom dictionary of regexes can be used [default None]
  • compound_lookup (boolean) – Include compound lookup
  • celery_obj (boolean) – If using Django a Celery task object can be used to keep track on ongoing tasks [default False]
Returns:

LibraryData object

close()[source]

Close the database connections

get_compound_ids()[source]

Extract the current compound ids in the database. Updates the self.compound_ids list

get_db_dict()[source]

Get a dictionary of the library spectra from the associated database

Example

>>> from msp2db.db import create_db
>>> from msp2db.parse import LibraryData
>>> db_pth = 'spectral_library.db'
>>> create_db(file_pth=db_pth, db_type='sqlite', db_name='spectra')
>>> libdata = LibraryData(msp_pth='MoNA-export-FAHFA.msp',
>>>                  db_pth=db_pth,
>>>                  db_type='sqlite',
>>>                  schema='mona',
>>>                  source='fahfa',
>>>                  chunk=200)
>>> libdata.db_dict()

If using a large database the resulting dictionary will be very large!

Returns:A dictionary with the following keys ‘library_spectra’, ‘library_spectra_meta’, ‘library_spectra_annotations’, ‘library_spectra_source’ and ‘metab_compound’. Where corresponding values for each key are list of list containing all the rows in the database.
insert_data(remove_data=False, db_type=u'sqlite')[source]

Insert data stored in the current chunk of parsing into the selected database

Parameters:
  • remove_data (boolean) – Remove the data stored within the LibraryData object for the current chunk of processing
  • db_type (str) – The type of database to submit to either ‘sqlite’, ‘mysql’ or ‘django_mysql’ [default sqlite]
msp2db.parse.add_splash_ids(splash_mapping_file_pth, conn, db_type=u'sqlite')[source]

Add splash ids to database (in case stored in a different file to the msp files like for MoNA)

Example

>>> from msp2db.db import get_connection
>>> from msp2db.parse import add_splash_ids
>>> conn = get_connection('sqlite', 'library.db')
>>> add_splash_ids('splash_mapping_file.csv', conn, db_type='sqlite')
Parameters:splash_mapping_file_pth (str) – Path to the splash mapping file (needs to be csv format and have no headers, should contain two columns. The first the accession number the second the splash. e.g. AU100601, splash10-0a4i-1900000000-d2bc1c887f6f99ed0f74
msp2db.db.create_db(file_pth)[source]

Create an empty SQLite database for library spectra.

Example

>>> from msp2db.db import create_db
>>> db_pth = 'library.db'
>>> create_db(file_pth=db_pth)
Parameters:file_pth (str) – File path for SQLite database
msp2db.db.db_dict(c)[source]

Get a dictionary of the library spectra from a database

Example

>>> from msp2db.db import get_connection
>>> conn = get_connection('sqlite', 'library.db')
>>> test_db_d = db_dict(conn.cursor())

If using a large database the resulting dictionary will be very large!

Parameters:c (cursor) – SQL database connection cursor
Returns:A dictionary with the following keys ‘library_spectra’, ‘library_spectra_meta’, ‘library_spectra_annotations’, ‘library_spectra_source’ and ‘metab_compound’. Where corresponding values for each key are list of list containing all the rows in the database.
msp2db.db.get_connection(db_type, db_pth, user=None, password=None, name=None)[source]

Get a connection to a SQL database. Can be used for SQLite, MySQL or Django MySQL database

Example

>>> from msp2db.db import get_connection
>>> conn = get_connection('sqlite', 'library.db')

If using “mysql” mysql.connector needs to be installed.

If using “django_mysql” Django needs to be installed.

Parameters:db_type (str) – Type of database can either be “sqlite”, “mysql” or “django_mysql”
Returns:sql connection object
msp2db.db.insert_query_m(data, table, conn, columns=None, db_type=u'mysql')[source]

Insert python list of tuples into SQL table

Parameters:
  • data (list) – List of tuples
  • table (str) – Name of database table
  • conn (connection object) – database connection object
  • columns (str) – String of column names to use if not assigned then all columns are presumed to be used [Optional]
  • db_type (str) – If “sqlite” or “mysql”
msp2db.re.get_compound_regex(schema=u'mona')[source]

Create a dictionary of regex for extracting the compound information for the spectra

msp2db.re.get_meta_regex(schema=u'mona')[source]

Create a dictionary of regex for extracting the meta data for the spectra

msp2db.utils.get_blank_dict(d)[source]

Remove values from dictionary

Parameters:d (dict) – any dictionary
Returns:dictionary with blank values
msp2db.utils.get_precursor_mz(exact_mass, precursor_type)[source]

Calculate precursor mz based on exact mass and precursor type

Parameters:
  • exact_mass (float) – exact mass of compound of interest
  • precursor_type (str) – Precursor type (currently only works with ‘[M-H]-’, ‘[M+H]+’ and ‘[M+H-H2O]+’
Returns:

neutral mass of compound

msp2db.utils.line_count(fn)[source]

Get line count of file

Parameters:fn (str) – Path to file
Returns:Number of lines in file (int)