Overview
The .chem.lead() method is a custom pandas accessor that creates lead versions of one or more variables in a DataFrame. It works by shifting the specified columns backward in time (i.e., to future periods) and merging the result back onto the original DataFrame.
This is especially useful for panel or time-series data where each row represents an observation at a time point for a particular unit (e.g., experiment, company, individual).
For a concrete example of usage, refer to https://pychemist.com/creating-lag-and-lead-variables/.
Accessor Registration
This method is registered with pandas under the accessor name .chem:
Access the method like this:
|
1 |
df.chem.lead(...) |
Method Signature
|
1 |
df.chem.lead(variables, identifier, time, shift=1, *, replace=False) |
Parameters
| Parameter | Type | Description |
|---|---|---|
| variables | str or list of str | Column(s) for which to create lead versions. |
| identifier | str | The name of the column identifying individual units (e.g., subject ID or group). |
| time | str | The name of the time column. Used to shift values within groups. |
| shift | int, default 1 | Number of time periods to shift. Must be a positive integer. |
| replace | bool, default False | If True, overwrites existing lead columns. If False, raises an error if there’s a naming conflict. |
Returns
- A modified copy of the original
pd.DataFrame, with lead versions of the specified variables added as new columns.
Behavior
- Validates parameter types and column existence.
- Creates a shifted version of the selected variables by subtracting the
shiftfrom thetimecolumn. - Merges this lead DataFrame back into the original, using
identifierandtimeas keys. - New columns are suffixed with:
_leadforshift=1_leadNforshift=N(e.g.,_lead3forshift=3)
- If
replace=Falseand a target lead column already exists, aValueErroris raised.
Notes
- The original DataFrame remains unchanged.
- Supports multiple variables and vectorized group-wise operations.
- Useful for forecasting models or previewing future values.
- For backward-looking operations, see
.chem.lag().
Example
|
1 2 3 4 5 6 7 |
df = pd.DataFrame({ "id": [1, 1, 1, 2, 2, 2], "time": [1, 2, 3, 1, 2, 3], "mass": [10, 15, 20, 5, 10, 15] }) df2 = df.chem.lead(variables="mass", identifier="id", time="time", shift=1) |
This will produce a new column called mass_lead with the mass value from the next time step (grouped by id).
Or more concisely:
|
1 |
df2 = df.chem.lead("mass", "id", "time") |
Common Use Cases
- Creating future-looking predictors in modeling.
- Forecast validation (comparing current and future state).
- Detecting upcoming transitions or changes in sequence.
Error Handling
- TypeError if
variablesis not a string or list of strings. - TypeError if
replaceis not a boolean. - TypeError if
shiftis not a positive integer. - ValueError if a specified column does not exist.
- ValueError if a lead column already exists and
replace=False.
Internals
The method:
- Copies the relevant subset of the DataFrame.
- Shifts the
timecolumn backward (df[time] - shift) to align future values. - Applies suffixes such as
_leador_leadN. - Merges the lead values back into the original DataFrame using
pd.merge(...).
See Also
.chem.lag()– for creating lagged (past) variables.chem.mutate()– for conditional column assignmentpd.DataFrame.shift()– basic shifting
