Overview
The .chem.lead()
method is a custom pandas accessor that creates lead versions of one or more variables in a DataFrame. It works by shifting the specified columns backward in time (i.e., to future periods) and merging the result back onto the original DataFrame.
This is especially useful for panel or time-series data where each row represents an observation at a time point for a particular unit (e.g., experiment, company, individual).
For a concrete example of usage, refer to https://pychemist.com/creating-lag-and-lead-variables/.
Accessor Registration
This method is registered with pandas under the accessor name .chem
:
Access the method like this:
1 |
df.chem.lead(...) |
Method Signature
1 |
df.chem.lead(variables, identifier, time, shift=1, *, replace=False) |
Parameters
Parameter | Type | Description |
---|---|---|
variables | str or list of str | Column(s) for which to create lead versions. |
identifier | str | The name of the column identifying individual units (e.g., subject ID or group). |
time | str | The name of the time column. Used to shift values within groups. |
shift | int , default 1 | Number of time periods to shift. Must be a positive integer. |
replace | bool , default False | If True , overwrites existing lead columns. If False , raises an error if there’s a naming conflict. |
Returns
- A modified copy of the original
pd.DataFrame
, with lead versions of the specified variables added as new columns.
Behavior
- Validates parameter types and column existence.
- Creates a shifted version of the selected variables by subtracting the
shift
from thetime
column. - Merges this lead DataFrame back into the original, using
identifier
andtime
as keys. - New columns are suffixed with:
_lead
forshift=1
_leadN
forshift=N
(e.g.,_lead3
forshift=3
)
- If
replace=False
and a target lead column already exists, aValueError
is raised.
Notes
- The original DataFrame remains unchanged.
- Supports multiple variables and vectorized group-wise operations.
- Useful for forecasting models or previewing future values.
- For backward-looking operations, see
.chem.lag()
.
Example
1 2 3 4 5 6 7 |
df = pd.DataFrame({ "id": [1, 1, 1, 2, 2, 2], "time": [1, 2, 3, 1, 2, 3], "mass": [10, 15, 20, 5, 10, 15] }) df2 = df.chem.lead(variables="mass", identifier="id", time="time", shift=1) |
This will produce a new column called mass_lead
with the mass value from the next time step (grouped by id
).
Or more concisely:
1 |
df2 = df.chem.lead("mass", "id", "time") |
Common Use Cases
- Creating future-looking predictors in modeling.
- Forecast validation (comparing current and future state).
- Detecting upcoming transitions or changes in sequence.
Error Handling
- TypeError if
variables
is not a string or list of strings. - TypeError if
replace
is not a boolean. - TypeError if
shift
is not a positive integer. - ValueError if a specified column does not exist.
- ValueError if a lead column already exists and
replace=False
.
Internals
The method:
- Copies the relevant subset of the DataFrame.
- Shifts the
time
column backward (df[time] - shift
) to align future values. - Applies suffixes such as
_lead
or_leadN
. - Merges the lead values back into the original DataFrame using
pd.merge(...)
.
See Also
.chem.lag()
– for creating lagged (past) variables.chem.mutate()
– for conditional column assignmentpd.DataFrame.shift()
– basic shifting