Big Data/Analytics Zone is brought to you in partnership with:

Mark is a graph advocate and field engineer for Neo Technology, the company behind the Neo4j graph database. As a field engineer, Mark helps customers embrace graph data and Neo4j building sophisticated solutions to challenging data problems. When he's not with customers Mark is a developer on Neo4j and writes his experiences of being a graphista on a popular blog at http://markhneedham.com/blog. He tweets at @markhneedham. Mark is a DZone MVB and is not an employee of DZone and has posted 520 posts at DZone. You can read more from them at their website. View Full User Profile

Algorithm of the Week: Bellman-Ford Algorithm in Python

02.05.2013
| 6503 views |
  • submit to reddit

The latest problem of the Algorithms 2 class required us to write an algorithm to calculate the shortest path between two nodes on a graph and one algorithm which allows us to do this is Bellman-Ford.

Bellman-Ford computes the single source shortest path which means that if we have a 5 vertex graph we’d need to run it 5 times to find the shortest path for each vertex and then find the shortest paths of those shortest paths.

One nice thing about Bellman-Ford compared to Djikstra is that it’s able to handle negative edge weights.

The pseudocode of the algorithm is as follows:

  • Let A = 2-D array of size n * n where n is the number of vertices
  • Initialise A[0,s] = 0; A[0,v] = infinity for all v ≠ s where s is the node we’re finding the shortest path for
  • for i=1,2,3,…n-1
    • for each v ∈ V
      • A[i,v] = min { A[i-1, v],

         min (w,v) ∈ E { A[k-1, w] + Costwv }
         &nbsp}
      • where (w,v) are the incoming edges of vertex v
  • check for negative cycles by running one more time and checking A[n] = A[n-1]
  • If they are equal then return A[n-1] otherwise report a negative cycle.

My first version looked like this:

import os
file = open(os.path.dirname(os.path.realpath(__file__)) + "/g_small.txt")
vertices, edges = map(lambda x: int(x), file.readline().replace("\n", "").split(" "))
 
adjacency_list = [[] for k in xrange(vertices)]
for line in file.readlines():
    tail, head, weight = line.split(" ")
    adjacency_list[int(head)-1].append({"from" : int(tail), "weight" : int(weight)})
 
s=0
cache = [[0 for k in xrange(vertices+1)] for j in xrange(vertices+1)]
cache[0][s] = 0
 
for v in range(0, vertices):
  if v != s:
    cache[0][v] = float("inf")
 
for i in range(1, vertices):
  for v in range(0, vertices):
    least_adjacent_cost = calculate_least_adjacent_cost(adjacency_list, i, v, cache[i-1])
    cache[i][v] = min(cache[i-1][v], least_adjacent_cost)
 
# detecting negative cycles
for v in range(0, vertices):
  least_adjacent_cost = calculate_least_adjacent_cost(adjacency_list, i, v, cache[vertices-1])  
  cache[vertices][v] = min(cache[vertices-1][v], least_adjacent_cost)
 
if(not cache[vertices] == cache[vertices-1]):
    raise Exception("negative cycle detected")
 
shortest_path = min(cache[vertices-1])
print("Shortest Path: " + str(shortest_path))

where calculate_least_adjacent_cost is defined like so:

def calculate_least_adjacent_cost(adjacency_list, i, v, cache):
    adjacent_nodes = adjacency_list[v]
 
    least_adjacent_cost = float("inf")
    for node in adjacent_nodes:
      adjacent_cost = cache[node["from"]-1] + node["weight"]
      if adjacent_cost < least_adjacent_cost:
        least_adjacent_cost = adjacent_cost
    return least_adjacent_cost

As you can see we’re using an adjacency list as our graph representation, mapping from each node to its corresponding edges. We then initialise the cache as per the algorithm and then have two nested for loops which we use to work out the shortest path from s to each vertex.

The calculate_least_adjacent_cost function is used to work out which of the incoming edges to a vertex has the lowest cost, taking into account previous shortest path calculations that we’ve done up to the source vertices of those incoming edges.

We then have an extra call at the end to check for negative cycles. If there is no change in the values calculated from s to each vertex then we know we don’t have any negative cycles because otherwise one of them would have come into effect and given us a different result.

This algorithm works but it’s inefficient in its use of space – our cache has size n*n when we only ever care about 2 of the rows – the previous and current ones – so we can just use a list instead.

If we do this the code now looks like this:

s=0
cache = [[] for j in xrange(vertices+1)]
cache[s] = 0
 
for v in range(0, vertices):
  if v != s:
    cache[v] = float("inf")
 
for i in range(1, vertices):
  for v in range(0, vertices):
    previous_cache = cache
    least_adjacent_cost = calculate_least_adjacent_cost(adjacency_list, i, v, previous_cache)
    cache[v] = min(previous_cache[v], least_adjacent_cost)
 
# detecting negative cycles
for v in range(0, vertices):
  previous_cache = copy.deepcopy(cache)
  least_adjacent_cost = calculate_least_adjacent_cost(adjacency_list, i, v, previous_cache)
  cache[v] = min(previous_cache[v], least_adjacent_cost)
 
if(not cache == previous_cache):
    raise Exception("negative cycle detected")

By doing this we lose the history of the algorithm over the run which means in its current state we wouldn’t be able to work our what the shortest path was, we just know its cost.

For a 1000 node, 47978 edge graph it takes 27 seconds to go over the whole graph and detect a negative cycle if there is one.

The code for this algorithm is on github as always.

Published at DZone with permission of Mark Needham, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)