About a year ago, I was using NetworkX (a Python package for studying graphs) for one of my projects. While I was working on it, I found that the topological sort implementation in NetworkX was very slow on largescale instances. So I decided to take a look at it, hoping I can figure out what’s wrong and fix it. In the end, I sped up considerably the implementation. Here is how I did it.
Please note that you don’t need to understand how topological sort works to follow the rest of the article as this is just a pretext to realize that it’s important to choose the right data structures when implementing algorithms.
Below is the original implementation of topological_sort_recursive
in NetworkX.


The main issue in this implementation is that two crucial operations
on the variable explored
, which is a list of nodes, take linear
time instead of taking
a constant time if proper data structures were chosen.
The operations w not in explored
(lines 13 and 25) and
explored.insert(0,v)
(line 16) are inefficient on a list because a
list in CPython is implemented as an array. This means that both a
membership
test
(in
) and inserting at an arbitrary position (insert
) usually take
linear time. In the first case, one must perform a check on each
element and in the second case one must shift all elements to the
right by one position to insert an element at the first position. This
leads to lots of unnecessary memory copies.
To prevent such situations, one should keep in mind the runtime of the operations on the standard data structures. If this is not familiar to you, this page is a good place to start.
Enough said, let’s now look at how to speed up the topological sort implementation.
To fix the in
operation, we simply need to change the type of
explored
to a set. This way, a membership test now takes a constant
time because a set is implemented as a hash
table in CPython.
Since we need to keep track of the order of the visited nodes (this is
the purpose of the insert call at line 16) and a set is unordered, we
must add another variable (order
) of type list to which we insert
the nodes. But instead of inserting a node at the beginning of the
list, we insert it at the end. Appending to a list usually takes a
constant time because we only need to insert a new element at the
end. Finally, we need to reverse the list at the end of the algorithm
to get the same order as in the original implementation.
Here is the new implementation:


I did a similar refactoring for the non recursive implementation and benchmarked my new implementations on a graph with 199,496 nodes and 600,000 arcs. The results are quite impressive:
Implementation  Original (s)  New (s) 

Recursive  4027.5  0.5 
NonRecursive  15.6  0.6 
Before, it took more than a hour to apply the recursive topological sort on this instance and 15.6 seconds for the nonrecursive version. It now takes less than a second in both cases.
I pushed my changes to the git repository and with the recent release of version 1.8 they are now part of the official release.