API¶
-
class
msp2db.parse.
LibraryData
(msp_pth, db_pth=None, mslevel=None, polarity=None, source=u'unknown', db_type=u'sqlite', password=None, user=None, mysql_db_name=None, chunk=200, schema=u'mona', user_meta_regex=None, user_compound_regex=None, compound_lookup=True, celery_obj=False)[source]¶ MSP file parser to SQL databases
After creating a SQL database for the library spectra using create_db, MSP files can be parsed into the database using the LibraryData class.
Example
>>> from msp2db.db import create_db >>> from msp2db.parse import LibraryData >>> db_pth = 'spectral_library.db' >>> create_db(file_pth=db_pth, db_type='sqlite', db_name='spectra') >>> libdata = LibraryData(msp_pth='MoNA-export-FAHFA.msp', >>> db_pth=db_pth, >>> db_type='sqlite', >>> schema='mona', >>> source='fahfa', >>> chunk=200)
Parameters: - msp_pth (str) – path to msp file or directory [required]
- db_pth (str) – path to sqlite database (only required when using SQLite database) [default None]
- source (str) – Source of the msp files (e.g. massbank) [default ‘unknown’]
- mslevel (int) – If the msp file does not contain the mslevel this can be defined here [default None]
- polarity (str) – If the msp file does not contain the polarity this can be defined here [default None]
- db_type (str) – The type of database to submit to (either ‘sqlite’, ‘mysql’ or ‘django_mysql’) [default sqlite]
- user (str) – Username for database (only required for non Django mysql databases) [default None]
- password (str) – Password for database (only required for non Django mysql databases) [default None]
- mysql_db_name (str) – Name of the mysql database (only required for non Django mysql databases) [default None]
- chunk (int) – Chunks of spectra to parse data (useful to control memory usage) [default 200]
- schema (str) – MSP files can vary based on how they were made, two standard schemas are available either ‘mona’ based on the MassBank of North America (MoNA) MSP files. And ‘massbank’ which is based on the more controlled MassBank MSP files https://github.com/MassBank/MassBank-data [default ‘mona’]
- user_meta_regex (dict) – For other MSP files not derived from either MoNA or MassBank a custom dictionary of regexes can be used [default None]
- user_compound_regex (dict) – For other MSP files not derived from either MoNA or MassBank a custom dictionary of regexes can be used [default None]
- compound_lookup (boolean) – Include compound lookup
- celery_obj (boolean) – If using Django a Celery task object can be used to keep track on ongoing tasks [default False]
Returns: LibraryData object
-
get_compound_ids
()[source]¶ Extract the current compound ids in the database. Updates the self.compound_ids list
-
get_db_dict
()[source]¶ Get a dictionary of the library spectra from the associated database
Example
>>> from msp2db.db import create_db >>> from msp2db.parse import LibraryData >>> db_pth = 'spectral_library.db' >>> create_db(file_pth=db_pth, db_type='sqlite', db_name='spectra') >>> libdata = LibraryData(msp_pth='MoNA-export-FAHFA.msp', >>> db_pth=db_pth, >>> db_type='sqlite', >>> schema='mona', >>> source='fahfa', >>> chunk=200) >>> libdata.db_dict()
If using a large database the resulting dictionary will be very large!
Returns: A dictionary with the following keys ‘library_spectra’, ‘library_spectra_meta’, ‘library_spectra_annotations’, ‘library_spectra_source’ and ‘metab_compound’. Where corresponding values for each key are list of list containing all the rows in the database.
-
insert_data
(remove_data=False, db_type=u'sqlite')[source]¶ Insert data stored in the current chunk of parsing into the selected database
Parameters: - remove_data (boolean) – Remove the data stored within the LibraryData object for the current chunk of processing
- db_type (str) – The type of database to submit to either ‘sqlite’, ‘mysql’ or ‘django_mysql’ [default sqlite]
-
msp2db.parse.
add_splash_ids
(splash_mapping_file_pth, conn, db_type=u'sqlite')[source]¶ Add splash ids to database (in case stored in a different file to the msp files like for MoNA)
Example
>>> from msp2db.db import get_connection >>> from msp2db.parse import add_splash_ids >>> conn = get_connection('sqlite', 'library.db') >>> add_splash_ids('splash_mapping_file.csv', conn, db_type='sqlite')
Parameters: splash_mapping_file_pth (str) – Path to the splash mapping file (needs to be csv format and have no headers, should contain two columns. The first the accession number the second the splash. e.g. AU100601, splash10-0a4i-1900000000-d2bc1c887f6f99ed0f74
-
msp2db.db.
create_db
(file_pth)[source]¶ Create an empty SQLite database for library spectra.
Example
>>> from msp2db.db import create_db >>> db_pth = 'library.db' >>> create_db(file_pth=db_pth)
Parameters: file_pth (str) – File path for SQLite database
-
msp2db.db.
db_dict
(c)[source]¶ Get a dictionary of the library spectra from a database
Example
>>> from msp2db.db import get_connection >>> conn = get_connection('sqlite', 'library.db') >>> test_db_d = db_dict(conn.cursor())
If using a large database the resulting dictionary will be very large!
Parameters: c (cursor) – SQL database connection cursor Returns: A dictionary with the following keys ‘library_spectra’, ‘library_spectra_meta’, ‘library_spectra_annotations’, ‘library_spectra_source’ and ‘metab_compound’. Where corresponding values for each key are list of list containing all the rows in the database.
-
msp2db.db.
get_connection
(db_type, db_pth, user=None, password=None, name=None)[source]¶ Get a connection to a SQL database. Can be used for SQLite, MySQL or Django MySQL database
Example
>>> from msp2db.db import get_connection >>> conn = get_connection('sqlite', 'library.db')
If using “mysql” mysql.connector needs to be installed.
If using “django_mysql” Django needs to be installed.
Parameters: db_type (str) – Type of database can either be “sqlite”, “mysql” or “django_mysql” Returns: sql connection object
-
msp2db.db.
insert_query_m
(data, table, conn, columns=None, db_type=u'mysql')[source]¶ Insert python list of tuples into SQL table
Parameters:
-
msp2db.re.
get_compound_regex
(schema=u'mona')[source]¶ Create a dictionary of regex for extracting the compound information for the spectra
-
msp2db.re.
get_meta_regex
(schema=u'mona')[source]¶ Create a dictionary of regex for extracting the meta data for the spectra
-
msp2db.utils.
get_blank_dict
(d)[source]¶ Remove values from dictionary
Parameters: d (dict) – any dictionary Returns: dictionary with blank values