Syracuse library documentation
This package provides some tools to deal with the Collatz conjecture.
The Collatz conjecture, also known as the "3n + 1 problem" or the "Syracuse problem", is an unsolved hypothesis in mathematics that concerns a sequence of operations applied to a positive integer. The conjecture states that no matter what the starting value of this integer is, the sequence will always eventually reach the value 1.
More specifically, the Collatz conjecture states that if one takes a positive integer n and applies the following function:
- If n is even, divide it by 2.
- If n is odd, multiply it by 3 and add 1.
Then, repeat the process with the resulting value, applying the function over and over again until the value of n eventually reaches 1.
Example
If one takes n = 6, the sequence for the Collatz conjecture would be: 6, 3, 10, 5, 16, 8, 4, 2, 1.
Although the conjecture has been computationally tested for extremely large values, it remains unproven to this day.
Since 3n+1 is even whenever n is odd, it is possible to define a "shortcut" (or "compressed") form for the Collatz function:
- If n is even, divide it by 2.
- If n is odd, multiply it by 3, add 1 and divide the whole result by 2.
This definition yields smaller values for the stopping time and total stopping time without changing the overall dynamics of the process.
See the related article on Wikipedia for deeper information.
The package provides the following modules:
Module | Description |
---|---|
core |
Provides classes to define Collatz sequences |
drawing |
Provides tool functions to render in various ways the Collatz sequences |
sequences |
Provides well-known sequences related to Collatz sequences |
It is important to notice that the core
module can be used without being imported explicitly. However, all other modules need to be imported explicitly.
>>> import syracuse # import all objects exposed by the core module
>>> import syracuse.drawing # import objects exposed by the drawing module
>>> import syracuse.sequences # import objects exposed by the sequences module
syracuse.core
This module provides two classes to represent a full or a compressed form of Collatz sequence.
Syracuse
A (full) Collatz sequence instanciated with a given initial value.
The instance is iterable, and gives the successive values of the Collatz sequence.
Be aware that a Collatz sequence is infinite: once it reaches 1 (and that is apparently the case for all tested initial values until now !), the cycle 4,2,1 returns indefinitely. So make sure you have implemented a stopping condition when you iterate over the sequence.
For each sequence an underlying, directional graph is implemented, and is populated the first time several attributes are read (like total_stopping_time
, max
and of course graph
and graph_view
), resulting an additionnal computing duration, that can be noticeable on "big" sequences. But the duration becomes imperceptible next times those attributes are read.
Furthermore, a "global" graph can be optionaly initialized, and is shared with all instances. It is basically useful when working on a large number of sequences, or on same sequences repeatedly. If a global graph population is asked for a sequence instantiation, then the local graph is deduced from it only when needed, and stored in a class attribute for further use. Be aware that the global graph activation generates a significant amount of additional memory storage, but on the other side the computation speed is generally highly improved.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
initial |
int
|
Initial value of the Collatz sequence. Must be strictly positive. |
required |
populate_global_graph |
bool
|
If |
False
|
Attributes:
Name | Type | Description |
---|---|---|
initial_value |
int
|
Reminder of the initial value given for the sequence initialisation. read-only |
stopping_time |
int
|
The smallest rank such that the value becomes lower than the initial one. read-only |
total_stopping_time |
int
|
The smallest rank such that the value equals 1. read-only |
total_stopping_sequence |
Tuple[int]
|
Tuple of the sequence from the initial value, to 1. read-only |
max |
int
|
Maximum value of the sequence. read-only |
max_rank |
int
|
Rank of the maximum value of the sequence. read-only |
graph_view |
dict
|
A dict-like structure representing the graph of the successive members of the current sequence. read-only |
graph |
networkx.Digraph
|
The underlying, directional graph for the current Collatz sequence. read-only |
global_graph |
networkx.Digraph
|
The global graph shared with all sequences, gathering in the same graph all sequences' graphs instanciated with |
single_graphs |
dict[networkx.Digraph]
|
Store the graphs generated for a single sequence to avoid repeated computations of the same graph (used only if |
generate_global_graph(max_initial_value, min_initial_value=1, excludes=[], reverse=False, parallel=False, parallel_algo=0)
classmethod
Generate a global graph gathering all Collatz sequences with initial values from min_initial_value
to max_initial_value
, without those beginning by members of excludes
.
To populate the graph, this function temporarly instanciates the needed Syracuse objects, that can cause lot of memory consumtion; nevertheless each instance are deleted before the initialization of the next one.
The resulting graph is stored in the class attribute global_graph
, and is returned by this function too for convenience.
It is possible to switch to an alternative computation algorithm, using the ability of the computer/OS to execute simultaneous tasks. Depending on the hardware (ie: number of "cores" of the CPU), the benefit can be really interesting for a large range of values (the definition of "large" depends heavily on your configuration). For the most little ranges, it is better to use the classical, sequential approach.
The available types of parallel computation are the following:
Type number | Description |
---|---|
0 | Usage of multiprocessing.Pool objects. |
1 | Usage of multiprocessing.Process objects: each process computes a part of edges and nodes of the global graph as shared objects, and the global graph is built in the parent process once all edges and nodes are ready. |
2 | Usage of multiprocessing.Process objects: each process computes its own part of the global graph, and the full global graph is later gathered in the parent process. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
min_initial_value |
int
|
The minimal initial value of the proceeded sequences. |
1
|
max_initial_value |
int
|
The maximal initial value of the proceeded sequences. |
required |
excludes |
list[int]
|
list of sequences excluded in the range. No effect for members lower than |
[]
|
reverse |
bool
|
If |
False
|
parallel |
bool
|
If |
False
|
parallel_algo |
int
|
Type of algorithm used for the parallel computation. Only relevant when |
0
|
Returns:
Type | Description |
---|---|
nx.DiGraph
|
networkx.DiGraph: A representation of the global graph of the Collatz sequences. |
total_stopping_times_range(max_initial_value, min_initial_value=1, parallel=False)
classmethod
Generate the tuple of the total stopping times of all Collatz sequences with initial values from min_initial_value
to max_initial_value
.
It is possible to switch to an alternative computation algorithm, using the ability of the computer/OS to execute simultaneous tasks. Depending on the hardware (ie: number of "cores" of the CPU), the benefit can be really interesting for a large range of values (the definition of "large" depends heavily on your configuration). For the most little ranges, it is better to use the classical, sequential approach.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
min_initial_value |
int
|
The minimal initial value of the proceeded sequences |
1
|
max_initial_value |
int
|
The maximal initial value of the proceeded sequences |
required |
parallel |
bool
|
If True, activates the parallel computation algorithm, using pool of multiprocessing workers |
False
|
Returns:
Type | Description |
---|---|
Tuple[int]
|
The ordered total stopping times of the Collatz sequences |
max_reached_values_range(max_initial_value, min_initial_value=1, parallel=False)
classmethod
Generate the tuple of the maximal values reached in all Collatz sequences with initial values from min_initial_value
to max_initial_value
.
It is possible to switch to an alternative computation algorithm, using the ability of the computer/OS to execute simultaneous tasks. Depending on the hardware (ie: number of "cores" of the CPU), the benefit can be really interesting for a large range of values (the definition of "large" depends heavily on your configuration). For the most little ranges, it is better to use the classical, sequential approach.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
min_initial_value |
int
|
The minimal initial value of the proceeded sequences |
1
|
max_initial_value |
int
|
The maximal initial value of the proceeded sequences |
required |
parallel |
bool
|
If True, activates the parallel computation algorithm, using pool of multiprocessing workers |
False
|
Returns:
Type | Description |
---|---|
Tuple[int]
|
A tuple with the ordered maximal reached values of the Collatz sequences |
reset_global_graph()
classmethod
Reset all graphs recorded in the class level: the global graph and the single_graphs list.
CompressedSyracuse
Bases: Syracuse
A compressed Collatz sequence instanciated with a given initial value.
This class provides the same parameters, attributes and methods as the inherited Syracuse class.
syracuse.drawing
This module provides several functions to render Collatz sequences into various displaying formats
single_sequence_to_dot_string(syr_sequence, converter='native')
Create a string containing a Graphviz dot format representing a single Collatz sequence, that can be read by Graphviz to render into various image formats.
It is possible to choose among several "converters" to create dot strings from the Collatz sequence. The "native" one (the default) does not need any external module. It is inspirated by https://en.wikipedia.org/wiki/File:Collatz-graph-300.svg. The "pydot" one uses the pydot
library, and creates the dot string from the internal Networkx graph.
Examples:
>>> import syracuse
>>> import syracuse.drawing
>>>
>>> # Native converter
>>> syr = syracuse.Syracuse(6)
>>> collatz6 = syracuse.drawing.single_sequence_to_dot_string(syr)
>>> print(collatz6)
digraph {
6 -> 3 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1;
}
>>>
>>> # pydot converter
>>> syr = syracuse.Syracuse(10)
>>> collatz10 = syracuse.drawing.single_sequence_to_dot_string(syr, "pydot")
>>> print(collatz10)
strict digraph {
10;
5;
16;
8;
4;
2;
1;
10 -> 5;
5 -> 16;
16 -> 8;
8 -> 4;
4 -> 2;
2 -> 1;
}
Parameters:
Name | Type | Description | Default |
---|---|---|---|
syr_sequence |
Syracuse
|
Sequence to be rendered |
required |
converter |
str
|
The converter used to create the dot string. The available values are: "native" or "pydot" |
'native'
|
Returns:
Type | Description |
---|---|
str
|
A Graphviz dot format representation of the sequence graph |
range_sequences_to_dot_string(compressed=False, limit=10, orientation='portrait', excludes=[], colored=False, converter='native')
Create a string containing a Graphviz dot format representing a range of Collatz sequences, that can be read by Graphviz to render into various image formats.
It is possible to choose among several "converters" to create dot strings from the Collatz sequence. The "native" one (the default) does not need any external module. It is inspirated by https://en.wikipedia.org/wiki/File:Collatz-graph-300.svg. The "pydot" one uses the pydot
library, and creates the dot string from the Networkx global graph.
Examples:
>>> import syracuse
>>> import syracuse.drawing
>>> collatz_range15 = syracuse.drawing.range_sequences_to_dot_string(limit = 15)
>>> print(collatz_range15)
digraph {
1;
2 -> 1;
3 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2;
4;
5;
6 -> 3;
7 -> 22 -> 11 -> 34 -> 17 -> 52 -> 26 -> 13 -> 40 -> 20 -> 10;
8;
9 -> 28 -> 14 -> 7;
10;
11;
12 -> 6;
13;
14;
15 -> 46 -> 23 -> 70 -> 35 -> 106 -> 53 -> 160 -> 80 -> 40;
}
>>> collatz_range10_horizontal_colored = syracuse.drawing.range_sequences_to_dot_string(orientation = "landscape", colored = True)
>>> print(collatz_range10_horizontal_colored)
digraph {
node [colorscheme=spectral10]
1 [color=1]
2 [color=1]
3 [color=1]
10 [color=3]
5 [color=2]
16 [color=4]
8 [color=2]
4 [color=1]
6 [color=2]
7 [color=2]
22 [color=5]
11 [color=3]
34 [color=7]
17 [color=4]
52 [color=10]
26 [color=6]
13 [color=3]
40 [color=9]
20 [color=5]
9 [color=2]
28 [color=6]
14 [color=3]
rankdir="LR"
1;
2 -> 1;
3 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2;
4;
5;
6 -> 3;
7 -> 22 -> 11 -> 34 -> 17 -> 52 -> 26 -> 13 -> 40 -> 20 -> 10;
8;
9 -> 28 -> 14 -> 7;
10;
}
Parameters:
Name | Type | Description | Default |
---|---|---|---|
compressed |
bool
|
indicates weither one deals with either compressed or "normal" sequences |
False
|
limit |
int
|
the upper bound of the range values from whom the sequences are generated |
10
|
orientation |
str
|
orientation of the rendered graph. Possible values: |
'portrait'
|
excludes |
list[int]
|
list of sequences excluded in the range. No effect for members greater than |
[]
|
colored |
bool
|
If |
False
|
converter |
str
|
The converter used to create the dot string. The available values are: "native" or "pydot". Please notice that this parameter is only relevant if colored is |
'native'
|
Returns:
Type | Description |
---|---|
str
|
A Graphviz dot format representation of the sequences range graph |
single_sequence_to_matplotlib_figure(syr_sequence, test=False)
Create a Matplotlib Figure object representing the evolution of the successive values of a particular Collatz sequence.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
syr_sequence |
Syracuse
|
Sequence to be rendered |
required |
test |
bool
|
if |
False
|
Returns:
Type | Description |
---|---|
mpl.figure.Figure
|
matplotlib.figure.Figure: A representation of the successive values of the Collatz sequence |
range_sequences_distribution_to_matplotlib_figure(max_initial_value, min_initial_value=1, statistic=True, test=False)
Create a Matplotlib Figure object representing the distribution of the successive values of a range of Collatz sequences.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
min_initial_value |
int
|
The minimal initial value of the proceeded sequences |
1
|
max_initial_value |
int
|
The maximal initial value of the proceeded sequences |
required |
statistic |
bool
|
If |
True
|
test |
bool
|
if |
False
|
Returns:
Type | Description |
---|---|
mpl.figure.Figure
|
matplotlib.figure.Figure: A bar graph representation of the distribution |
syracuse.sequences
This module provides several well-known sequences related to Collatz sequences.
Unless explicitely mentioned, all sequences provided here implement the iterator protocol. That is, they can be used in for
loops, and as parameter for the built-in iter()
and next()
functions. It also means that only one item is kept in memory at a time, the next items being generated on demand or lazily. Unlike container types like lists or dictionnaries, data are not stored but forgotten once the next item is yield. By the way, iterators are very memory-efficient and can process infinite data streams, like most of the sequences generated here.
The available sequences are the following:
sequence | OEIS reference | Description |
---|---|---|
total_stopping_time_records | A006877 | Sequence of starting values of Collatz sequences with a total stopping time longer than of any smaller starting value. |
max_value_records | A006884 | Sequence of starting values of Collatz sequences with maximum values higher than of any smaller starting value. |
total_stopping_time_records
Bases: Iterator
Sequence of starting values of Collatz sequences with a total stopping time longer than of any smaller starting value. OEIS reference: A006877.
Example
6 is included in this sequence because the total stopping time of the Collatz sequence beginning by 6 (that is, 8) is greater than total stopping times of all Collatz sequences beginning by 1, 2, 3, 4 and 5.
In order to speedup the computation, it is possible to apply optional optimizations. Those optimizations are combinable (by addition of their numbers) for a greater efficiency. Deeper details (including mathematical demontrations) are available in this article from T. Leavens and M. Vermeulen.
The available sorts of optimizations are the following (the speedup factor is calculated with the first 50 records. It is only indicative, since it may differ on your own configuration):
Number | Name | Description | Speedup factor |
---|---|---|---|
0 | None | No optimization | 1.0 |
1 | Even numbers | Due to the fact that Syracuse(2k).total_stopping_time = Syracuse(k).total_stopping_time + 1 , it is easy to predict the next even candidate for this sequence, and then it is not necessary to compute the Collatz sequences for the other even numbers below this candidate. |
1.9 |
2 | k mod 6 = 5 | If the remainder of the division of the initial value by 6 is 5, then this cannot be a record for the total stopping time. | 1.2 |
4 | a posteriori cutoff | It is possible to stop the iteration process of a Collatz sequence before its end, by comparing the current iterate value with all previous records and the number of steps necessary to reach it. This optimization needs to store all items previously computed, making this iterator less memory-efficient. | 2.0 |
8 | make_odd | Speed up the iterations of the Collatz sequences by replacing the successive divisions by 2, by only one step. Be aware that using this "optimization" alone is absolutly not efficient (as you can see, the speedup factor is more or less 1). However, it is a real "booster" when associated with other optimization algorithms. For instance, the a posteriori cutoff is boosted with a speedup factor of 3.3 when used together with the "make_odd" one. | 1.0 |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
optimization |
int
|
Type of optimization(s) applied. No optimization by default. |
0
|
max_value_records
Bases: Iterator
Sequence of starting values of Collatz sequences with maximum values higher than of any smaller starting value. OEIS reference: A006884.
Example
7 is included in this sequence because the maximum value of the Collatz sequence beginning by 7 (that is, 52) is greater than maximum values of all Collatz sequences beginning by 1, 2, 3, 4, 5 and 6.
In order to speedup the computation, it is possible to apply optional optimizations. Those optimizations are combinable (by addition of their numbers) for a greater efficiency. Deeper details (including mathematical demontrations) are available in this article from T. Leavens and M. Vermeulen.
The available sorts of optimizations are the following (the speedup factor is calculated with the first 40 records. It is only indicative, since it may differ on your own configuration):
Number | Name | Description | Speedup factor |
---|---|---|---|
0 | None | No optimization | 1.0 |
1 | Odd numbers only | Except 2, there is no even record for max value. So it is not necessary to compute Collatz sequences starting by even values. | 2.0 |
2 | k mod 6 = 5 | If the remainder of the division of the initial value by 6 is 5, then this cannot be a record for the max value. | 1.2 |
4 | a posteriori cutoff | A record, if found, is a max value appearing before the iteration process has fallen under the initial value. Hence, once the initial value is reached, it is possible to stop the iteration process. | 8.0 |
8 | make_odd | Speed up the iterations of the Collatz sequences by replacing the successive divisions by 2, by only one step. | 1.9 |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
optimization |
int
|
Type of optimization(s) applied. No optimization by default. |
0
|