August 2013
About a year ago, I was using NetworkX (a Python package for studying graphs) for one of my projects. While I was working on it, I found that the topological sort implementation in NetworkX was very slow on largescale instances. So I decided to take a look at it, hoping I can figure out what's wrong and fix it. In the end, I sped up considerably the implementation. Here is how I did it.
Please note that you don't need to understand how topological sort works to follow the rest of the article as this is just a pretext to realize that it's important to choose the right data structures when implementing algorithms.
Below is the original implementation of topological_sort_recursive
in NetworkX.


The main issue in this implementation is that two crucial operations
on the variable explored
, which is a list of nodes, take linear
time instead of taking
a constant time if proper data structures were chosen.
The operations w not in explored
(lines 12 and 24) and
explored.insert(0, v)
(line 15) are inefficient on a list because a
list in CPython is implemented as an array. This means that both a
membership
test
(in
) and inserting at an arbitrary position (insert
) usually take
linear time. In the first case, one must perform a check on each
element and in the second case one must shift all elements to the
right by one position to insert an element at the first position. This
leads to lots of unnecessary memory copies.
To prevent such situations, one should keep in mind the runtime of the operations on the standard data structures. If this is not familiar to you, this page is a good place to start.
Enough said, let's now look at how to speed up the topological sort implementation.
To fix the in
operation, we simply need to change the type of
explored
to a set. This way, a membership test now takes a constant
time because a set is implemented as a hash
table in CPython.
Since we need to keep track of the order of the visited nodes (this is the
purpose of the insert call explored.insert(0, v)
at line 15) and a set is
unordered, we must add another variable (order
) of type list to which we
insert the nodes. But instead of inserting a node at the beginning of the list,
we insert it at the end. Appending to a list usually takes a constant time
because we only need to insert a new element at the end. Finally, we need to
reverse the list at the end of the algorithm to get the same order as in the
original implementation.
Here is the new implementation:
def topological_sort_recursive(G, nbunch=None):
if not G.is_directed():
raise nx.NetworkXError("Topological sort not defined on undirected graphs.")
def _dfs(v):
ancestors.add(v)
for w in G[v]:
if w in ancestors:
raise nx.NetworkXUnfeasible("Graph contains a cycle.")
if w not in explored:
_dfs(w)
ancestors.remove(v)
explored.add(v)
order.append(v)
ancestors = set()
explored = set()
order = []
if nbunch is None:
nbunch = G.nodes_iter()
for v in nbunch:
if v not in explored:
_dfs(v)
return list(reversed(order))
I did a similar refactoring for the non recursive implementation and benchmarked my new implementations on a graph with 199,496 nodes and 600,000 arcs. The results are quite impressive:
Implementation  Original time (s)  New time (s) 

Recursive  4027.5  0.5 
NonRecursive  15.6  0.6 
Before, it took more than a hour to apply the recursive topological sort on this instance and 15.6 seconds for the nonrecursive version. It now takes less than a second in both cases.
My changes have been merged into the master branch, and have been released in version 1.8.