I'm using Pandas to populate 6 new variables with values that are conditional to other data variables. The entire dataset consists of about 700,000 rows and 14 variables (columns) including my newly added ones.
My first approach was to use itertuples(), mainly down to experience being minimal here. This clocked around 9600 seconds.
I've managed to get this more efficient (~3500 seconds) by using apply(). Here is an example of one of the new variables.
housing_df = utils.make_data_frame("data/source_data/housing_with_child.dta", "stata") …Run Code Online (Sandbox Code Playgroud)