Overview
The .chem.lag() method is a custom pandas accessor that creates lagged versions of one or more variables in a DataFrame. It operates by shifting the specified columns by a given number of time periods and merging the result back onto the original DataFrame.
This is especially useful for time-series panel data where each row belongs to a unique unit (e.g., company, experiment, patient) over time.
For a concrete example of usage, refer to https://pychemist.com/creating-lag-and-lead-variables/.
Accessor Registration
This method is registered with pandas under the accessor name .chem:
Access the method like this:
|
1 |
df.chem.lag(...) |
Method Signature
|
1 |
df.chem.lag(variables, identifier, time, shift=1, *, replace=False) |
Parameters
| Parameter | Type | Description |
|---|---|---|
| variables | str or list of str | Column(s) for which to create lagged versions. |
| identifier | str | The name of the column identifying individual units (e.g., subject ID or group). |
| time | str | The name of the time column. Used to shift values within groups. |
| shift | int, default 1 | Number of time periods to shift. Positive values create lags. Negative values are not allowed. |
| replace | bool, default False | If True, overwrites existing lagged columns. If False, raises an error if there’s a naming conflict. |
Returns
- A modified copy of the original
pd.DataFrame, with lagged versions of the specified variables added as new columns.
Behavior
- Verifies input types and column existence.
- Constructs a lagged version of the selected variables by shifting the
timecolumn forward by the givenshiftamount. - Merges this lagged DataFrame back onto the original, based on the
identifierandtime. - New columns are suffixed with:
_lagforshift=1_lagNforshift=N(e.g.,_lag3forshift=3)
- If
replace=Falseand any of the output columns already exist, the method raises aValueError.
Notes
- The original DataFrame remains unchanged.
- Supports multiple variables and vectorized operations.
- Designed for panel or longitudinal data.
- For negative shift (lead variables) refert to
.chem.lead()
Example
|
1 2 3 4 5 6 7 |
df = pd.DataFrame({ "id": [1, 1, 1, 2, 2, 2], "time": [1, 2, 3, 1, 2, 3], "mass": [10, 15, 20, 5, 10, 15] }) df_lagged = df.chem.lag(variables="mass", identifier="id", time="time", shift=1) |
This will produce a new column called mass_lag with the mass value from the previous time step (by id).
Or more concisely:
|
1 |
df_lagged = df.chem.lag("mass", "id", "time") |
Common Use Cases
- Creating lagged predictors in time-series regression.
- Modeling delayed effects in experiments.
- Comparing values across sequential periods.
Error Handling
- TypeError if
variablesis not a string or list of strings. - TypeError if
replaceis not a boolean. - TypeError if
shiftis not a positive integer. - ValueError if any specified variable is missing from the DataFrame.
- ValueError if a lagged column already exists and
replace=False.
Internals
The method:
- Creates a shifted copy of the target columns using
df[time] + shift. - Applies suffixes like
_lag,_lag2, etc. - Merges the lagged columns back onto the original DataFrame using
pd.merge(...).
See Also
.chem.mutate()– for conditional column assignmentpd.DataFrame.shift()– basic shifting
