# Data pipes with pandas

Lately, I have kept myself busy reading the `pandas`

documentation. I am always happy when I find something very useful that I didn’t know before. One of the things that I’ve lately discovered is piping.

The idea is simple. Suppose you want to apply a function to a data frame or series, to then apply other, other, … One way would be to perform this operations in a “sandwich” like fashion:

In the long run, this notation becomes fairly messy and error prone. What you want to do here is use `pipe()`

. Pipe can be thought of as a function chaining. This is how you’d perform the same task as before with `pipe()`

:

This way is a cleaner way that helps keep track the order in which the functions and its corresponding arguments are applied.

## Example 1

Suppose, for a moment, as strange as it may sound now, that you want to apply the following three functions to a data set or series: The first function subtracts a number from the data. The second function divides the data by a given parameter. The third function multiplies the data by a given parameter and then adds another given number.

Here is the data set.

Col 1 | Col 2 | Col 3 | |
---|---|---|---|

A | 1 | 1 | 1 |

B | 2 | 2 | 2 |

C | 3 | 3 | 3 |

I have already created the functions. Here is how we apply them using `pipe()`

:

First I am adding two to every single entry of the data set with the `adder`

function. Then, I use the `div`

function to divide by two. Finally, I reverse the whole process with the `sub_mult`

function, multiplying by two and subtracting two. Unsurprisingly, the application of the functions in that order give us our original data set. Here is how the data set is transformed with every stage of our pipe:

Col 1 | Col 2 | Col 3 | |
---|---|---|---|

A | 1 | 1 | 1 |

B | 2 | 2 | 2 |

C | 3 | 3 | 3 |

Col 1 | Col 2 | Col 3 | |
---|---|---|---|

A | 3 | 3 | 3 |

B | 4 | 4 | 4 |

C | 5 | 5 | 5 |

Col 1 | Col 2 | Col 3 | |
---|---|---|---|

A | 1.5 | 1.5 | 1.5 |

B | 2 | 2 | 2 |

C | 2.5 | 2.5 | 2.5 |

Col 1 | Col 2 | Col 3 | |
---|---|---|---|

A | 1 | 1 | 1 |

B | 2 | 2 | 2 |

C | 3 | 3 | 3 |

**Note**: To apply `pipe()`

, the first argument of the function must be the data set. For example, `adder`

accepts two arguments `adder(data, add)`

. As `data`

is the first parameter that takes in the data set, we can directly use `pipe()`

. When this is not the case, no sweat. There’s a way around this. We only need to specify to `pipe`

what’s the name of the argument in the function that refers to the data set.

## Example 2

Suppose, now, that the function adder is specified as `adder(add, data)`

. As the data is not the first argument, we need to pass it to pipe as follows:

In the end, we get the same result as before. All the entries are added 2.

Col 1 | Col 2 | Col 3 | |
---|---|---|---|

A | 1 | 1 | 1 |

B | 2 | 2 | 2 |

C | 3 | 3 | 3 |

Col 1 | Col 2 | Col 3 | |
---|---|---|---|

A | 3 | 3 | 3 |

B | 4 | 4 | 4 |

C | 5 | 5 | 5 |