Skip to content

: in region strings passed to --chrom #230

Open
@julibeg

Description

@julibeg

Region strings passed to --chrom can be of the format ref_id:start-end to only consider part of a reference sequence. However, reference sequences sometimes have : in their sequence name (e.g. if they were created with samtools faidx. In this case, mosdepth fails with chromosome name ... not found if the whole reference name is provided (see example below).

Example:

$ samtools view -h example.bam
@HD     VN:1.6  SO:unsorted     GO:query
@SQ     SN:seq:1-100    LN:100
@PG     ID:samtools     PN:samtools     VN:1.19 CL:samtools view -h example.bam
read    0       seq:1-100       1       4       100M    *       0       0       AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA *       NM:i:0  ms:i:200        AS:i:200        nn:i:0  tp:A:P  cm:i:85 s1:i:100        s2:i:98 de:f:0  rl:i:0

$ mosdepth test example.bam --chrom seq:1-100
[mosdepth] chromosome seq not found

In contrast, samtools commands work with sequence IDs that contain colons (see example below).
samtools example:

$ cat ref.fa
>seq:1-100
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

$ samtools faidx ref.fa seq:1-100
>seq:1-100
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

$ samtools faidx ref.fa seq:1-100:50-60
>seq:1-100:50-60
AAAAAAAAAAA

Could mosdepth be adapted to do the same?

Many thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions