sequencings
Parsing
Tool used to parse vcf's: vt
When a sequencing is added to the db:
- use vt to create a decomposed vcf
- decomposition creates a biallelic variant out of a multiallelic variant. There is a possible loss of information during this process so annotations of former multiallelic variants should be recomputed Source
- use vt to create a normalized vcf
- A vcf file is normalized if all its variants are parsimonious (as short as possible without having a length of 0) and left aligned (have no duplicated nucleotides)
Load
When a sequencing is loaded (Celery chained tasks):
- load samples
- read the sample names and create the sample db object
- load variants
- delete all former sample_records (if sequencing has been loaded before)
- create all contigs and regions present in the vcf as region objects
- For each contig:
- load normalized vcf
- Create a dictionary {sample_name:sample_object}
- get all variants included in vcf
- for each variant
- create variant object
- for each sample:
- add variant info to metadata
- add sample info to metadata field sample
- remove initial samples field from metadata
- create samplerecords object
Storage
Keep original vcf during development phase for traceability. Once the process is stable, delete original after decomposition. Keep normalized vcf while sequencing is in db for QC processes
Once a sequencing is deleted, remove all linked vcf's from server
data model
classDiagram
direction BT
class Sequencing {
<< MetadataMixin, SensitiveMixin, VCFSensitiveMixin >>
# JSONField metadata
# ForeignKey created_by
# DateTimeField created_at
# ForeignKey changed_by
# DateTimeField changed_at
# FileField vcf_file
# FileField vcf_file_decomposed
# FileField vcf_file_normalized
+ ForeignKey assession
+ PositiveSmallIntegerField loading
+ BooleanField locked
}
Sequencing "*" --* "1" Assession : assession
Sequencing "*" --* "1" User : created_by
Sequencing "*" --* "1" User : changed_by
class Sample {
<< SensitiveMixin, MetadataMixin >>
# JSONField metadata
# ForeignKey created_by
# DateTimeField created_at
# ForeignKey changed_by
# DateTimeField changed_at
+ CharField name
+ ForeignKey sequencing
+ ForeignKey person ~null=True~
+ ManyToManyField projects ~through=SampleProjectRelationship~
+ ManyToManyField variants ~through=SampleRecord~
}
Sample "*" --* "1" Sequencing : sequencing
Sample "*" --* "1" User : created_by
Sample "*" --* "1" User : changed_by
Sample "*" ..* "1" Person : person
class SampleProjectRelationship {
<< M2M >>
+ ForeignKey sample
+ ForeignKey project
}
SampleProjectRelationship "*" --* "1" Project : project
SampleProjectRelationship "*" --* "1" Sample : sample
class SampleRecord {
<< MetadataMixin >>
# JSONField metadata
+ ForeignKey sample
+ ForeignKey variant
}
SampleRecord "*" --* "1" Variant : variant
SampleRecord "*" --* "1" Sample : sample
Last update:
October 4, 2023