Skip to content

Genomes

Generate genomes for snailz with random mutations.

GenePool dataclass

Keep track of generated genomes.

Source code in snailz/genomes.py
16
17
18
19
20
21
22
23
24
25
@dataclass
class GenePool:
    '''Keep track of generated genomes.'''

    length: int
    reference: str
    individuals: list[str]
    locations: list[int]
    susceptible_loc: int = 0
    susceptible_base: str = ''

genomes(options)

Main driver for genome generation.

Each genome is a string of ACGT bases of the same length. One location is randomly chosen as "significant", and a specific mutation there predisposes the snail to size changes. Other mutations are added randomly at other locations.

  • options.params: parameter file.
  • options.outfile: output file.

The result is saved as JSON with the following entries:

  • length: fixed length of all genomes.
  • reference: the unmutated reference genome.
  • individuals: a list of individual genomes with mutations.
  • locations: a list of locations where mutations may occur.
  • susceptible_loc: one of those locations where the significant mutation may occur.
  • susceptible_base: the mutated base at that location that indicates susceptibility.
Source code in snailz/genomes.py
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
def genomes(options: Namespace) -> None:
    '''Main driver for genome generation.

    Each genome is a string of ACGT bases of the same length.
      One location is randomly chosen as "significant",
      and a specific mutation there predisposes the snail to size changes.
      Other mutations are added randomly at other locations.

    - options.params: parameter file.
    - options.outfile: output file.

    The result is saved as JSON with the following entries:

    - length: fixed length of all genomes.
    - reference: the unmutated reference genome.
    - individuals: a list of individual genomes with mutations.
    - locations: a list of locations where mutations may occur.
    - susceptible_loc: one of those locations where the significant mutation may occur.
    - susceptible_base: the mutated base at that location that indicates susceptibility.
    '''
    assert options.params != options.outfile, 'Cannot use same filename for options and parameters'
    options.params = load_params(GenomeParams, options.params)
    random.seed(options.params.seed)
    data = _random_genomes(options.params)
    _add_susceptibility(data)
    _save(options.outfile, data)

_add_susceptibility(data)

Add indication of genetic susceptibility.

Parameters:

Name Type Description Default
data GenePool

a GenePool instance being populated.

required
Source code in snailz/genomes.py
56
57
58
59
60
61
62
63
64
65
66
67
def _add_susceptibility(data: GenePool) -> None:
    '''Add indication of genetic susceptibility.

    Args:
        data: a GenePool instance being populated.
    '''
    if not data.locations:
        return
    loc = _choose_one(data.locations)
    choices = {ind[loc] for ind in data.individuals} - {data.reference[loc]}
    data.susceptible_loc = loc
    data.susceptible_base = _choose_one(list(sorted(choices)))

_random_bases(length)

Generate a random sequence of bases of the specified length.

Parameters:

Name Type Description Default
length int

desired genome length.

required

Returns:

Type Description
str

Random sequence of bases of required length.

Source code in snailz/genomes.py
70
71
72
73
74
75
76
77
78
79
80
def _random_bases(length: int) -> str:
    '''Generate a random sequence of bases of the specified length.

    Args:
        length: desired genome length.

    Returns:
        Random sequence of bases of required length.
    '''
    assert 0 < length
    return ''.join(random.choices(DNA, k=length))

_random_genomes(params)

Generate a set of genomes with specified number of point mutations.

Parameters:

Name Type Description Default
params GenomeParams

genome generation parameters.

required

Returns:

Type Description
GenePool

A GenePool object suitable for serialization.

Source code in snailz/genomes.py
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
def _random_genomes(params: GenomeParams) -> GenePool:
    '''Generate a set of genomes with specified number of point mutations.

    Args:
        params: genome generation parameters.

    Returns:
        A GenePool object suitable for serialization.
    '''
    assert 0 <= params.num_snp <= params.length

    # Reference genomes and specific genomes to modify.
    reference = _random_bases(params.length)
    individuals = [reference] * params.num_genomes

    # Locations for SNPs.
    locations = random.sample(list(range(params.length)), params.num_snp)

    # Introduce significant mutations.
    for loc in locations:
        candidates = _other_bases(reference, loc)
        bases = [reference[loc]] + random.sample(candidates, k=len(candidates))
        individuals = [_mutate_snps(params, reference, ind, loc, bases) for ind in individuals]

    # Introduce other random mutations.
    other_locations = list(set(range(params.length)) - set(locations))
    individuals = [
        _mutate_other(ind, params.prob_other, other_locations) for ind in individuals
    ]

    # Return structure.
    individuals.sort()
    locations.sort()
    return GenePool(
        length=params.length, reference=reference, individuals=individuals, locations=locations
    )

_save(outfile, data)

Save or show generated data.

Parameters:

Name Type Description Default
outfile str

output filename.

required
data GenePool

to be saved.

required
Source code in snailz/genomes.py
121
122
123
124
125
126
127
128
129
130
131
132
def _save(outfile: str, data: GenePool) -> None:
    '''Save or show generated data.

    Args:
        outfile: output filename.
        data: to be saved.
    '''
    as_text = json.dumps(asdict(data), indent=4)
    if outfile:
        Path(outfile).write_text(as_text)
    else:
        print(as_text)

_mutate_snps(params, reference, genome, loc, bases)

Introduce single nucleotide polymorphisms at the specified location.

Parameters:

Name Type Description Default
params GenomeParams

genome generation parameters.

required
reference str

reference genome.

required
genome str

genome to mutate.

required
loc int

where to introduce mutation.

required
bases str

alternative bases.

required

Returns:

Type Description
str

Mutated genome.

Source code in snailz/genomes.py
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
def _mutate_snps(params: GenomeParams, reference: str, genome: str, loc: int, bases: str) -> str:
    '''Introduce single nucleotide polymorphisms at the specified location.

    Args:
        params: genome generation parameters.
        reference: reference genome.
        genome: genome to mutate.
        loc: where to introduce mutation.
        bases: alternative bases.

    Returns:
        Mutated genome.
    '''
    choice = _choose_one(bases, params.snp_probs)
    return genome[:loc] + choice + genome[loc + 1 :]

_mutate_other(genome, prob, locations)

Introduce other mutations at specified locations.

Parameters:

Name Type Description Default
genome str

to be mutated.

required
prob float

probability of mutation.

required
locations list

where mutation might occur

required

Returns:

Type Description
str

Possibly-mutated genome.

Source code in snailz/genomes.py
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
def _mutate_other(genome: str, prob: float, locations: list) -> str:
    '''Introduce other mutations at specified locations.

    Args:
        genome: to be mutated.
        prob: probability of mutation.
        locations: where mutation might occur

    Returns:
        Possibly-mutated genome.
    '''
    if random.random() > prob:
        return genome
    loc = random.sample(locations, k=1)[0]
    base = random.choice(_other_bases(genome, loc))
    genome = genome[:loc] + base + genome[loc + 1 :]
    return genome

_choose_one(values, weights=None)

Convenience wrapper to choose a single items with weighted probabilities.

Parameters:

Name Type Description Default
values list

what to choose from.

required
weights list | None

optional list of weights.

None

Returns:

Type Description
object

One value chosen at random from those given.

Source code in snailz/genomes.py
171
172
173
174
175
176
177
178
179
180
181
def _choose_one(values: list, weights: list|None = None) -> object:
    '''Convenience wrapper to choose a single items with weighted probabilities.

    Args:
        values: what to choose from.
        weights: optional list of weights.

    Returns:
        One value chosen at random from those given.
    '''
    return random.choices(values, weights=weights, k=1)[0]

_other_bases(seq, loc)

Create a list of bases minus the one in the sequence at that location.

Returns a list instead of a set because the result is used in random.choices(), which requires an indexable sequence. Result is sorted for reproducibility.

Parameters:

Name Type Description Default
seq str

base sequence.

required
loc int

location of base to not choose.

required

Returns:

Type Description
list

List of other bases.

Source code in snailz/genomes.py
184
185
186
187
188
189
190
191
192
193
194
195
196
197
def _other_bases(seq: str, loc: int) -> list:
    '''Create a list of bases minus the one in the sequence at that location.

    Returns a list instead of a set because the result is used in random.choices(),
      which requires an indexable sequence. Result is sorted for reproducibility.

    Args:
        seq: base sequence.
        loc: location of base to _not_ choose.

    Returns:
        List of other bases.
    '''
    return list(sorted(set(DNA) - {seq[loc]}))