Skip to content

Syracuse library documentation

This package provides some tools to deal with the Collatz conjecture.

The Collatz conjecture, also known as the "3n + 1 problem" or the "Syracuse problem", is an unsolved hypothesis in mathematics that concerns a sequence of operations applied to a positive integer. The conjecture states that no matter what the starting value of this integer is, the sequence will always eventually reach the value 1.

More specifically, the Collatz conjecture states that if one takes a positive integer n and applies the following function:

  • If n is even, divide it by 2.
  • If n is odd, multiply it by 3 and add 1.

Then, repeat the process with the resulting value, applying the function over and over again until the value of n eventually reaches 1.

Example

If one takes n = 6, the sequence for the Collatz conjecture would be: 6, 3, 10, 5, 16, 8, 4, 2, 1.

Although the conjecture has been computationally tested for extremely large values, it remains unproven to this day.

Since 3n+1 is even whenever n is odd, it is possible to define a "shortcut" (or "compressed") form for the Collatz function:

  • If n is even, divide it by 2.
  • If n is odd, multiply it by 3, add 1 and divide the whole result by 2.

This definition yields smaller values for the stopping time and total stopping time without changing the overall dynamics of the process.

See the related article on Wikipedia for deeper information.

The package provides the following modules:

Module Description
core Provides classes to define Collatz sequences
drawing Provides tool functions to render in various ways the Collatz sequences
sequences Provides well-known sequences related to Collatz sequences

It is important to notice that the core module can be used without being imported explicitly. However, all other modules need to be imported explicitly.

>>> import syracuse # import all objects exposed by the core module
>>> import syracuse.drawing # import objects exposed by the drawing module
>>> import syracuse.sequences # import objects exposed by the sequences module

syracuse.core

This module provides two classes to represent a full or a compressed form of Collatz sequence.

Syracuse

A (full) Collatz sequence instanciated with a given initial value.

The instance is iterable, and gives the successive values of the Collatz sequence.

Example

1
2
3
4
5
syracuse = Syracuse(1000)
for value in syracuse:
        print(value)
        if value == 1:
                break

Be aware that a Collatz sequence is infinite: once it reaches 1 (and that is apparently the case for all tested initial values until now !), the cycle 4,2,1 returns indefinitely. So make sure you have implemented a stopping condition when you iterate over the sequence.

For each sequence an underlying, directional graph is implemented, and is populated the first time several attributes are read (like total_stopping_time, max and of course graph and graph_view), resulting an additionnal computing duration, that can be noticeable on "big" sequences. But the duration becomes imperceptible next times those attributes are read.

Furthermore, a "global" graph can be optionaly initialized, and is shared with all instances. It is basically useful when working on a large number of sequences, or on same sequences repeatedly. If a global graph population is asked for a sequence instantiation, then the local graph is deduced from it only when needed, and stored in a class attribute for further use. Be aware that the global graph activation generates a significant amount of additional memory storage, but on the other side the computation speed is generally highly improved.

Parameters:

Name Type Description Default
initial int

Initial value of the Collatz sequence. Must be strictly positive.

required
populate_global_graph bool

If True, a global graph is built, and populated with the graph of the current sequence. Default is False

False

Attributes:

Name Type Description
initial_value int

Reminder of the initial value given for the sequence initialisation. read-only

stopping_time int

The smallest rank such that the value becomes lower than the initial one. read-only

total_stopping_time int

The smallest rank such that the value equals 1. read-only

total_stopping_sequence Tuple[int]

Tuple of the sequence from the initial value, to 1. read-only

max int

Maximum value of the sequence. read-only

max_rank int

Rank of the maximum value of the sequence. read-only

graph_view dict

A dict-like structure representing the graph of the successive members of the current sequence. read-only

graph networkx.Digraph

The underlying, directional graph for the current Collatz sequence. read-only

global_graph networkx.Digraph

The global graph shared with all sequences, gathering in the same graph all sequences' graphs instanciated with populate_global_graph = True. read-only

single_graphs dict[networkx.Digraph]

Store the graphs generated for a single sequence to avoid repeated computations of the same graph (used only if populate_global_graph is True)

generate_global_graph(max_initial_value, min_initial_value=1, excludes=[], reverse=False, parallel=False, parallel_algo=0) classmethod

Generate a global graph gathering all Collatz sequences with initial values from min_initial_value to max_initial_value, without those beginning by members of excludes.

To populate the graph, this function temporarly instanciates the needed Syracuse objects, that can cause lot of memory consumtion; nevertheless each instance are deleted before the initialization of the next one. The resulting graph is stored in the class attribute global_graph, and is returned by this function too for convenience.

It is possible to switch to an alternative computation algorithm, using the ability of the computer/OS to execute simultaneous tasks. Depending on the hardware (ie: number of "cores" of the CPU), the benefit can be really interesting for a large range of values (the definition of "large" depends heavily on your configuration). For the most little ranges, it is better to use the classical, sequential approach.

The available types of parallel computation are the following:

Type number Description
0 Usage of multiprocessing.Pool objects.
1 Usage of multiprocessing.Process objects: each process computes a part of edges and nodes of the global graph as shared objects, and the global graph is built in the parent process once all edges and nodes are ready.
2 Usage of multiprocessing.Process objects: each process computes its own part of the global graph, and the full global graph is later gathered in the parent process.

Parameters:

Name Type Description Default
min_initial_value int

The minimal initial value of the proceeded sequences.

1
max_initial_value int

The maximal initial value of the proceeded sequences.

required
excludes list[int]

list of sequences excluded in the range. No effect for members lower than min_initial_value or greater than max_initial_value.

[]
reverse bool

If True, generate the global graph by computing the constitutive sequences from the maximal to the minimal initial value. Only relevant for non-parallel computation (ie: if parallel is False).

False
parallel bool

If True, activates the parallel computation algorithm, using pool of multiprocessing workers.

False
parallel_algo int

Type of algorithm used for the parallel computation. Only relevant when parallel is True.

0

Returns:

Type Description
nx.DiGraph

networkx.DiGraph: A representation of the global graph of the Collatz sequences.

total_stopping_times_range(max_initial_value, min_initial_value=1, parallel=False) classmethod

Generate the tuple of the total stopping times of all Collatz sequences with initial values from min_initial_value to max_initial_value.

It is possible to switch to an alternative computation algorithm, using the ability of the computer/OS to execute simultaneous tasks. Depending on the hardware (ie: number of "cores" of the CPU), the benefit can be really interesting for a large range of values (the definition of "large" depends heavily on your configuration). For the most little ranges, it is better to use the classical, sequential approach.

Parameters:

Name Type Description Default
min_initial_value int

The minimal initial value of the proceeded sequences

1
max_initial_value int

The maximal initial value of the proceeded sequences

required
parallel bool

If True, activates the parallel computation algorithm, using pool of multiprocessing workers

False

Returns:

Type Description
Tuple[int]

The ordered total stopping times of the Collatz sequences

max_reached_values_range(max_initial_value, min_initial_value=1, parallel=False) classmethod

Generate the tuple of the maximal values reached in all Collatz sequences with initial values from min_initial_value to max_initial_value.

It is possible to switch to an alternative computation algorithm, using the ability of the computer/OS to execute simultaneous tasks. Depending on the hardware (ie: number of "cores" of the CPU), the benefit can be really interesting for a large range of values (the definition of "large" depends heavily on your configuration). For the most little ranges, it is better to use the classical, sequential approach.

Parameters:

Name Type Description Default
min_initial_value int

The minimal initial value of the proceeded sequences

1
max_initial_value int

The maximal initial value of the proceeded sequences

required
parallel bool

If True, activates the parallel computation algorithm, using pool of multiprocessing workers

False

Returns:

Type Description
Tuple[int]

A tuple with the ordered maximal reached values of the Collatz sequences

reset_global_graph() classmethod

Reset all graphs recorded in the class level: the global graph and the single_graphs list.

CompressedSyracuse

Bases: Syracuse

A compressed Collatz sequence instanciated with a given initial value.

This class provides the same parameters, attributes and methods as the inherited Syracuse class.

syracuse.drawing

This module provides several functions to render Collatz sequences into various displaying formats

single_sequence_to_dot_string(syr_sequence, converter='native')

Create a string containing a Graphviz dot format representing a single Collatz sequence, that can be read by Graphviz to render into various image formats.

It is possible to choose among several "converters" to create dot strings from the Collatz sequence. The "native" one (the default) does not need any external module. It is inspirated by https://en.wikipedia.org/wiki/File:Collatz-graph-300.svg. The "pydot" one uses the pydot library, and creates the dot string from the internal Networkx graph.

Examples:

>>> import syracuse
>>> import syracuse.drawing
>>>
>>> # Native converter
>>> syr = syracuse.Syracuse(6)
>>> collatz6 = syracuse.drawing.single_sequence_to_dot_string(syr)
>>> print(collatz6)
digraph {
6 -> 3 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1;
}
>>>
>>> # pydot converter
>>> syr = syracuse.Syracuse(10)
>>> collatz10 = syracuse.drawing.single_sequence_to_dot_string(syr, "pydot")
>>> print(collatz10)
strict digraph {
10;
5;
16;
8;
4;
2;
1;
10 -> 5;
5 -> 16;
16 -> 8;
8 -> 4;
4 -> 2;
2 -> 1;
}

Parameters:

Name Type Description Default
syr_sequence Syracuse

Sequence to be rendered

required
converter str

The converter used to create the dot string. The available values are: "native" or "pydot"

'native'

Returns:

Type Description
str

A Graphviz dot format representation of the sequence graph

range_sequences_to_dot_string(compressed=False, limit=10, orientation='portrait', excludes=[], colored=False, converter='native')

Create a string containing a Graphviz dot format representing a range of Collatz sequences, that can be read by Graphviz to render into various image formats.

It is possible to choose among several "converters" to create dot strings from the Collatz sequence. The "native" one (the default) does not need any external module. It is inspirated by https://en.wikipedia.org/wiki/File:Collatz-graph-300.svg. The "pydot" one uses the pydot library, and creates the dot string from the Networkx global graph.

Examples:

>>> import syracuse
>>> import syracuse.drawing
>>> collatz_range15 = syracuse.drawing.range_sequences_to_dot_string(limit = 15)
>>> print(collatz_range15)
digraph {
1;
2 -> 1;
3 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2;
4;
5;
6 -> 3;
7 -> 22 -> 11 -> 34 -> 17 -> 52 -> 26 -> 13 -> 40 -> 20 -> 10;
8;
9 -> 28 -> 14 -> 7;
10;
11;
12 -> 6;
13;
14;
15 -> 46 -> 23 -> 70 -> 35 -> 106 -> 53 -> 160 -> 80 -> 40;
}
>>> collatz_range10_horizontal_colored = syracuse.drawing.range_sequences_to_dot_string(orientation = "landscape", colored = True)
>>> print(collatz_range10_horizontal_colored)
digraph {
node [colorscheme=spectral10]
1 [color=1]
2 [color=1]
3 [color=1]
10 [color=3]
5 [color=2]
16 [color=4]
8 [color=2]
4 [color=1]
6 [color=2]
7 [color=2]
22 [color=5]
11 [color=3]
34 [color=7]
17 [color=4]
52 [color=10]
26 [color=6]
13 [color=3]
40 [color=9]
20 [color=5]
9 [color=2]
28 [color=6]
14 [color=3]
rankdir="LR"
1;
2 -> 1;
3 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2;
4;
5;
6 -> 3;
7 -> 22 -> 11 -> 34 -> 17 -> 52 -> 26 -> 13 -> 40 -> 20 -> 10;
8;
9 -> 28 -> 14 -> 7;
10;
}

Parameters:

Name Type Description Default
compressed bool

indicates weither one deals with either compressed or "normal" sequences

False
limit int

the upper bound of the range values from whom the sequences are generated

10
orientation str

orientation of the rendered graph. Possible values: landscape or portrait

'portrait'
excludes list[int]

list of sequences excluded in the range. No effect for members greater than limit.

[]
colored bool

If True, the rendered graph will have gradient colors, depending on the value of each node

False
converter str

The converter used to create the dot string. The available values are: "native" or "pydot". Please notice that this parameter is only relevant if colored is False (if colored is True, the native converter is used).

'native'

Returns:

Type Description
str

A Graphviz dot format representation of the sequences range graph

single_sequence_to_matplotlib_figure(syr_sequence, test=False)

Create a Matplotlib Figure object representing the evolution of the successive values of a particular Collatz sequence.

Parameters:

Name Type Description Default
syr_sequence Syracuse

Sequence to be rendered

required
test bool

if True, displays the graph immediately (for testing purposes)

False

Returns:

Type Description
mpl.figure.Figure

matplotlib.figure.Figure: A representation of the successive values of the Collatz sequence

range_sequences_distribution_to_matplotlib_figure(max_initial_value, min_initial_value=1, statistic=True, test=False)

Create a Matplotlib Figure object representing the distribution of the successive values of a range of Collatz sequences.

Parameters:

Name Type Description Default
min_initial_value int

The minimal initial value of the proceeded sequences

1
max_initial_value int

The maximal initial value of the proceeded sequences

required
statistic bool

If True, displays some statistical charateristics on the returned figure

True
test bool

if True, displays the graph immediately (for testing purposes)

False

Returns:

Type Description
mpl.figure.Figure

matplotlib.figure.Figure: A bar graph representation of the distribution

syracuse.sequences

This module provides several well-known sequences related to Collatz sequences.

Unless explicitely mentioned, all sequences provided here implement the iterator protocol. That is, they can be used in for loops, and as parameter for the built-in iter() and next() functions. It also means that only one item is kept in memory at a time, the next items being generated on demand or lazily. Unlike container types like lists or dictionnaries, data are not stored but forgotten once the next item is yield. By the way, iterators are very memory-efficient and can process infinite data streams, like most of the sequences generated here.

The available sequences are the following:

sequence OEIS reference Description
total_stopping_time_records A006877 Sequence of starting values of Collatz sequences with a total stopping time longer than of any smaller starting value.
max_value_records A006884 Sequence of starting values of Collatz sequences with maximum values higher than of any smaller starting value.

total_stopping_time_records

Bases: Iterator

Sequence of starting values of Collatz sequences with a total stopping time longer than of any smaller starting value. OEIS reference: A006877.

Example

6 is included in this sequence because the total stopping time of the Collatz sequence beginning by 6 (that is, 8) is greater than total stopping times of all Collatz sequences beginning by 1, 2, 3, 4 and 5.

In order to speedup the computation, it is possible to apply optional optimizations. Those optimizations are combinable (by addition of their numbers) for a greater efficiency. Deeper details (including mathematical demontrations) are available in this article from T. Leavens and M. Vermeulen.

The available sorts of optimizations are the following (the speedup factor is calculated with the first 50 records. It is only indicative, since it may differ on your own configuration):

Number Name Description Speedup factor
0 None No optimization 1.0
1 Even numbers Due to the fact that Syracuse(2k).total_stopping_time = Syracuse(k).total_stopping_time + 1, it is easy to predict the next even candidate for this sequence, and then it is not necessary to compute the Collatz sequences for the other even numbers below this candidate. 1.9
2 k mod 6 = 5 If the remainder of the division of the initial value by 6 is 5, then this cannot be a record for the total stopping time. 1.2
4 a posteriori cutoff It is possible to stop the iteration process of a Collatz sequence before its end, by comparing the current iterate value with all previous records and the number of steps necessary to reach it. This optimization needs to store all items previously computed, making this iterator less memory-efficient. 2.0
8 make_odd Speed up the iterations of the Collatz sequences by replacing the successive divisions by 2, by only one step. Be aware that using this "optimization" alone is absolutly not efficient (as you can see, the speedup factor is more or less 1). However, it is a real "booster" when associated with other optimization algorithms. For instance, the a posteriori cutoff is boosted with a speedup factor of 3.3 when used together with the "make_odd" one. 1.0

Parameters:

Name Type Description Default
optimization int

Type of optimization(s) applied. No optimization by default.

0

max_value_records

Bases: Iterator

Sequence of starting values of Collatz sequences with maximum values higher than of any smaller starting value. OEIS reference: A006884.

Example

7 is included in this sequence because the maximum value of the Collatz sequence beginning by 7 (that is, 52) is greater than maximum values of all Collatz sequences beginning by 1, 2, 3, 4, 5 and 6.

In order to speedup the computation, it is possible to apply optional optimizations. Those optimizations are combinable (by addition of their numbers) for a greater efficiency. Deeper details (including mathematical demontrations) are available in this article from T. Leavens and M. Vermeulen.

The available sorts of optimizations are the following (the speedup factor is calculated with the first 40 records. It is only indicative, since it may differ on your own configuration):

Number Name Description Speedup factor
0 None No optimization 1.0
1 Odd numbers only Except 2, there is no even record for max value. So it is not necessary to compute Collatz sequences starting by even values. 2.0
2 k mod 6 = 5 If the remainder of the division of the initial value by 6 is 5, then this cannot be a record for the max value. 1.2
4 a posteriori cutoff A record, if found, is a max value appearing before the iteration process has fallen under the initial value. Hence, once the initial value is reached, it is possible to stop the iteration process. 8.0
8 make_odd Speed up the iterations of the Collatz sequences by replacing the successive divisions by 2, by only one step. 1.9

Parameters:

Name Type Description Default
optimization int

Type of optimization(s) applied. No optimization by default.

0