Power Flow Study for Power Systems

Power flow studies are one of the most important aspects of power system planning and operation. The power flow gives us the sinusoidal steady state characteristics of the entire system - voltages, real and reactive power generated, as well as absorbed and line losses. With this, one can obtain the voltage magnitudes and angles at each bus, the generation of each generating unit, as well as the real and reactive power losses in the system. All this is necessary to ensure the security, economy, and control of electrical energy distribution.

For a power system with N buses, the steady state model of the power system is given by a set of up to 2N-2 nonlinear equations, which are iteratively solved using the Newton-Raphson or Gauss-Jacobi iterative solvers. For a system having a number of buses N in the order of 40,000 and number of lines in the order 50,000, successful computation of the power flow in real time for these types of systems requires high computational throughput. The author demonstrates the design of a power flow algorithm on an NVIDIA CUDA-capable graphics processing unit or GPU that is much faster compared to its CPU variant. In comparison to previous efforts to perform power flow studies on the GPU [1], the authors have obtained better performance of power flow analysis on both CPU and GPU.

Download the Power Flow Whitepaper in pdf form.

Performing power flow on large systems

Power flow studies involve solving nonlinear power flow equations. In this work, the author has focussed on two iterative solvers:

  • Newton-Raphson (Polar form) solver
  • Gauss-Jacobi solver

The power utilities find either of these methods suitable for the analysis of their transmission networks depending on the nature of these networks and how much time each of these solvers takes to converge. This study shows two different solvers to showcase the speed improvements obtained using the GPU.

The software used in this study uses standard IEEE data to perform power flow studies. The basic steps involved in power flow studies are:
  • Step 1 - Read bus data, generator data, and branch data.
  • Step 2 - Compute the bus admittance matrix.
  • Step 3 - Initialize the voltage magnitudes and angles at each bus.
  • Step 4 - Find the bus types - PV bus, PQ bus, or Slack bus (Figure 1).
  • Step 5 - Compute the power injected at each bus using the generation data of the generators connected at a particular bus reduced by the load demand at each bus.
  • Step 6 - Compute the mismatch between the actual power injections at each bus (computed using the admittance matrix and voltages at each bus) and the power injections calculated using generation and load data.
  • Step 7 - Estimate the new voltage magnitudes and angles at PV and PQ buses by either Newton - Raphson or Gauss - Jacobi solvers.
  • Step 8 - Update the reactive power generation of each generator using the updated voltages and the reactive power of generators connected to Slack bus.
  • Step 9 - If reactive generation limits are enforced, find out the generators that violate their reactive(Q) generation limits, set them to their limiting values, and perform the power flow study again.

Steps 4 to 9 are repeated many times until the solution converges. The computationally intensive steps in the power flow study are 5 through 9; therefore, it is important to find an appropriate implementation to execute these steps in parallel on the GPU.


Computing power injections on the GPU

The power specified vector represents the power injections at each bus computed using the generator and load data. This means that the power of all the generators connected at a bus is to be added and reduced by the load at that bus. Performing this in parallel presents difficulties as threads corresponding to number of generators will need to update total generation at each bus, which may be common as we may have a number of generators on a single bus.

This is dealt with by sorting the generator data in the order of increasing bus numbers on which they are connected. Then we compute two vectors that give information about the number of generators on each bus and the position from where a particular thread ( threads equal to number of buses) should start looking for in the sorted generator data. This has been depicted in the figure below:

If we launch threads equal to number of buses, each thread can now fetch the information from noOfGen for the number of times it has to loop starting at the address given by the cumulative vector. To further improve the performance, we find out the indices of the non-zero elements in the noOfGen vector and launch threads equal to the number of non-zero elements.

This technique avoids computing the connection matrix that describes the relationships of the generators with the buses, which is then used for computing interdependent data using matrix- matrix multiplications. This implementation computes only the non-zero elements using reduced noOfGen and cumNoOfGen vectors and avoids the need to perform the matrix-matrix multiplications generally employed in such cases.

Newton-Raphson(NR) Solver

This step involves computing Jacobian matrix according to the equations given in Figure 2. We used read-only texture memory as this is large (depending on the size of global memory) and is cached. Texture memory can be read from kernels using texture fetching device functions. Reading from texture memory is generally (not absolutely) faster than reading from global or local memory. A vector (vector_d) is formed where pv and pq hold the indices of pv and pq buses respectively.


If n represents the number of buses in the system and np gives the number of pq buses, then the jacobian matrix is a square matrix with each dimension equal to n + np -1. The mismatch vector has the same length, and its first n-1 values represent the real power mismatch for all buses except the slack bus. The next np values represent the reactive power mismatch corresponding to the pq buses. The indices of pv and pq buses are obtained from vector_d and are used to compute each element of the jacobian matrix, which can be computed independently resulting in significant speed improvement when carried out on the GPU.

After the jacobian matrix and mismatch vectors are computed, the equation is solved using the Jacket solution for GPU datatypes. The result is used to update voltage magnitudes of pq buses and voltage angles of all buses except the slack bus.

Gauss- Jacobi (GJ) Solver

The Gauss- Jacobi solver is easily parallelized using the pv and pq vectors which store the indices of pv and pq buses. The indices are used to update the voltage magnitudes and angles of pv and pq buses. Texture memory is used as this step involves fetching voltage magnitudes and angles and admittance matrix values whose sizes increase greatly as system size increases and is referenced using pv and pq vectors which may change during power flow analysis. Performing this computationally intensive step on the GPU leads to significant performance improvement.

Updating reactive power generation of each generating unit

To update the generator data, information is needed regarding the number of generators on each bus so that the total reactive power computed using the new voltage vector with the addition of load can be distributed either equally or proportionally among all the generators connected to a bus.

This step is performed in parallel by finding out the number of generators that are "ON" on each bus (using the generator status vector) and then using cumNoOfGen vector to iterate through the generator data in the same way as described in forming power injection vector.

Updating the real power of the first generator connected to the reference bus is clubbed with the step to find out the actual number of generators on each bus. Each thread that finds the actual number of generators which are "ON" at each bus also checks whether the point where it starts iterating in the generator data corresponds to the first generator connected to the reference bus. If this condition is found true, this thread updates the real power of that generator.

Finding generator violations and updating load data and generator status

A new vector initialized to zero signals if a generator has violated its reactive generation limits. This vector has non-zero values for generators that violate generation limits and is used to update the load connected at the bus at which generator violations occurred.

If any generator is found to violate its upper or lower limits, its generation is fixed at that limiting value. The load at the bus at which this generator is connected is reduced by its generation, and the generator is temporarily switched "OFF" for the remaining iterations. The bus at which the generator violations occurred is converted to PQ bus. Updating the load data requires information about the generators connected at a particular bus and is parallelized using the same methodology described in the previous sections using noOfGen and cumNoOfGen vectors.

Performance curves:

The following plots show the speed-up obtained comparing GPU and CPU versions of power flow running on TESLA C1060 versus Intel(R) Xeon(TM) CPU 2.80GHz in MATLAB achieving 28x and 35x for GJ and NR respectively:


This project aimed to develop a software approach that could perform power flow numerical analysis on the graphics processing unit, or GPU. The goal to increase performance associated with this analysis was attained using Jacket from AccelerEyes and MATLAB® from The Mathworks.


  • Jaideep Singh, Ipseeta Aruni, Indian Institute of Technology, Roorkee,, India


  • [1] A. Gopal, D. Niebur, S. Venkatasubramanian, "DC Power Flow Based Contingency Analysis Using Graphics Processing Units," Power Tech, IEEE Lausanne, 2007, pp. 731 - 736

« Back to Case Studies