Python script to extract metadata for a list of given BioSample ids from the NCBI BioSample database. Data is extracted from XML files and output to a CSV.
Note
CSV output works best when comparing metadata from different BioSamples in the same BioProject as the tag names in the XML will be consistent. Comparing runs from different BioProjects can result in a messy CSV output.
# [optional] create/load virtualenv
pip install -r requirements.txtpython bs2csv.py input_ids.txt input_ids.txtis a text file containing new line separated NCBI BioSample accession ids
python bs2csv.py input_ids.txt -o metadata_output.csv -v values.txtmetadata_output.csvis the name of the desired output file. Defaults tobiosample_metadata.csvvalues.txtis a text file containing new line separated values that are used when extracting metadata- only the information from tags found in
values.txtwill be stored in the output
- only the information from tags found in