10 DTW-Based Prosody Analysis

10.1 Overview

This notebook implements two variants of Dynamic Time Warping (DTW) to measure prosodic similarity between native speaker (NS) reference recordings and student learner recordings. The core acoustic feature used is F0 (fundamental frequency), which captures the pitch contour of an utterance — a primary carrier of prosodic information such as intonation and stress patterns.

The pipeline consists of four stages:

Stage	Description
1. Standard DTW	Aligns raw z-score-normalised F0 contours and computes a distance score
2. Derivative DTW (DDTW)	Applies DTW to the first-order derivative of F0, emphasising shape over absolute pitch
3. DTW Contour Plots	Visualises pitch contours and the accumulated-cost matrix for Standard DTW
4. DDTW Contour Plots	Visualises pitch + derivative contours and the cost matrix for DDTW

10.2 Background

10.2.1 Dynamic Time Warping (DTW)

DTW is a sequence-alignment algorithm that finds the optimal non-linear mapping between two time series of different lengths. Unlike simple Euclidean distance, DTW is robust to time-axis distortions — a crucial property when comparing speech from speakers who produce the same utterance at different rates.

Given two sequences \(X = (x_1, \ldots, x_n)\) and \(Y = (y_1, \ldots, y_m)\), DTW constructs an \(n \times m\) accumulated cost matrix \(D\) where each cell stores the minimum cumulative cost to align up to that point:

\[D(i, j) = d(x_i, y_j) + \min\bigl(D(i-1,j),\; D(i,j-1),\; D(i-1,j-1)\bigr)\]

The warping path is then traced back from \(D(n, m)\) to \(D(1, 1)\), and the final DTW distance equals \(D(n, m)\).

10.2.2 Derivative DTW (DDTW)

Standard DTW can be misled by absolute pitch differences between speakers (e.g., a male NS vs. a female learner). DDTW (Keogh & Pazzani, 2001) addresses this by aligning the first-order derivatives of the two series rather than the raw values. This makes the comparison sensitive to the shape of the pitch contour — rises, falls, and plateaus — rather than its overall level.

The derivative is estimated with a symmetric finite-difference scheme:

\[x'_i = \begin{cases} x_2 - x_1 & i = 1 \\[4pt] \dfrac{x_{i+1} - x_{i-1}}{2} & 1 < i < n \\[4pt] x_n - x_{n-1} & i = n \end{cases}\]

10.2.3 F0 Extraction

F0 is extracted with librosa.pyin, a probabilistic YIN algorithm that estimates the fundamental frequency frame by frame. Unvoiced frames produce NaN values and are removed before DTW is applied.

10.2.4 Z-Score Normalisation

Before comparison, each F0 sequence is standardised to zero mean and unit variance:

\[z_i = \frac{x_i - \bar{x}}{\sigma_x}\]

This normalisation removes speaker-specific pitch range differences and makes distances interpretable across different sentence-speaker pairs.

10.3 Dependencies and Installation

Before running any code, install the two non-standard audio packages:

%%capture
# !pip install dtaidistance
!pip install librosa

Note

The %%capture magic suppresses noisy installation output in the notebook. dtaidistance is commented out because it is assumed to already be installed; uncomment it if you encounter an ImportError.

Package	Role in this project
`pandas`	Reading the audio list spreadsheet; writing result Excel files
`numpy`	Array arithmetic and finite-difference derivative computation
`librosa`	Loading `.wav` files and extracting F0 via the `pyin` algorithm
`scipy.stats`	Z-score normalisation (`zscore`)
`scipy.signal`	`find_peaks` — imported for completeness, not actively called
`dtaidistance`	Optimised Standard DTW computation (Stages 1 and 3)
`matplotlib`	Generating and saving two-panel contour figures
`PIL` (Pillow)	Imported for potential image annotation; not actively used

10.4 Data Import

10.4.1 Code

import pandas as pd
import os

# Load the combined DataFrame
df_audio_list = pd.read_excel('DTW_audio_list.xlsx')

# Directory paths
reference_dir  = 'NS_wav'       # Native Speaker Reference Audio
test_files_dir = 'Students_wav' # Student Audio Files

df_audio_list.head()

10.4.2 Explanation

This cell sets up the data index that drives every subsequent stage.

pd.read_excel('DTW_audio_list.xlsx') Reads the spreadsheet into a DataFrame. Each row represents one NS-student comparison pair. The expected columns are:

Column	Description
`Sent_NO`	Integer ID of the sentence; used to locate the NS file (e.g., `1.wav`)
`Sentence`	Text transcription of the target sentence; appears in plot titles
`Audio_File`	Filename of the student recording (e.g., `S01_Sent01.wav`)

Directory variables reference_dir and test_files_dir are string path prefixes combined with os.path.join() later to build full file paths. The working directory must contain these two subdirectories.

df_audio_list.head() Previews the first five rows to verify the spreadsheet loaded correctly before running the computationally expensive audio pipeline.

10.5 Stage 1: Standard DTW

10.5.1 Overview

Standard DTW computes a single scalar distance between two z-score-normalised F0 contours after finding the optimal non-linear time alignment. A smaller distance indicates that the student’s pitch contour more closely matches the native speaker’s.

10.5.2 Full Code

from PIL import Image, ImageDraw, ImageFont
import pandas as pd
import os
import matplotlib.pyplot as plt
import numpy as np
from scipy.spatial.distance import euclidean  # For distance calculation
from scipy.signal import find_peaks
import librosa                                 # For audio processing
import librosa.display
from scipy import stats                        # For z-score calculation
from dtaidistance import dtw                   # Using dtaidistance for original DTW

def extract_f0(file_path):
    y, sr = librosa.load(file_path)
    f0, voiced_flag, voiced_probs = librosa.pyin(
        y,
        fmin=librosa.note_to_hz('C2'),
        fmax=librosa.note_to_hz('C7')
    )
    return f0

# Initialize list to store results
results = []

# Process each row in the DataFrame
for index, row in df_audio_list.iterrows():
    # Reference file
    reference_file_num  = row['Sent_NO']
    reference_file_sent = row['Sentence']
    reference_file_path = os.path.join(reference_dir, f"{reference_file_num}.wav")

    # Test file
    test_file_name = row['Audio_File']
    test_file_path = os.path.join(test_files_dir, test_file_name)

    # Check if files exist
    if not os.path.exists(reference_file_path) or not os.path.exists(test_file_path):
        print(f"File not found: {reference_file_path} or {test_file_path}")
        continue

    # Load and extract F0 from the reference and test audio files
    f0_reference = extract_f0(reference_file_path)
    f0_test      = extract_f0(test_file_path)

    # Remove NaNs (in case of unvoiced segments)
    f0_reference = f0_reference[~np.isnan(f0_reference)]
    f0_test      = f0_test[~np.isnan(f0_test)]

    # Normalize the F0 values using z-score
    f0_reference_z = stats.zscore(f0_reference)
    f0_test_z      = stats.zscore(f0_test)

    # Perform DTW and get the best path
    distance, path = dtw.warping_paths(f0_reference_z, f0_test_z)
    best_path = np.array(dtw.best_path(path))

    # Store results
    results.append({
        'Sent_NO':                 reference_file_num,
        'Test File':               test_file_name,
        'DTW_Original_Distance_z': distance,
        'Sentence':                reference_file_sent
    })

# Create DataFrame from results and display
df_results = pd.DataFrame(results)
print(df_results)

# Save results to an Excel file
excel_file_path = 'DTW_Output/01_Standard_DTW_z.xlsx'
df_results.to_excel(excel_file_path, index=False)
print(f"Results saved to {excel_file_path}")

10.5.3 Explanation

10.5.3.1 Import block

The import block pulls in all required libraries for this stage. Only from dtaidistance import dtw is specific to DTW computation; the rest provide general-purpose array, audio, and statistics utilities. PIL, scipy.spatial.distance, and scipy.signal are imported for completeness but are not directly called in this stage.

10.5.3.2 `extract_f0(file_path)` — F0 extraction helper

def extract_f0(file_path):
    y, sr = librosa.load(file_path)
    f0, voiced_flag, voiced_probs = librosa.pyin(
        y,
        fmin=librosa.note_to_hz('C2'),
        fmax=librosa.note_to_hz('C7')
    )
    return f0

This reusable function wraps two librosa calls:

Line	What it does
`librosa.load(file_path)`	Decodes the `.wav` file and resamples to 22 050 Hz (librosa default). Returns waveform `y` and sample rate `sr`.
`librosa.pyin(...)`	Runs the probabilistic YIN (pYIN) pitch tracker frame by frame. Returns `f0` (Hz per frame), `voiced_flag` (boolean), and `voiced_probs` (voicing probability).
`fmin=librosa.note_to_hz('C2')`	Sets the lower search bound to ~65 Hz — below the lowest typical male speaking voice.
`fmax=librosa.note_to_hz('C7')`	Sets the upper bound to ~2093 Hz — well above the highest typical female speaking voice.
`return f0`	Returns only the pitch array; `voiced_flag` and `voiced_probs` are discarded here but could be used for voicing-based filtering.

Unvoiced frames (silence, fricatives, stops) are represented as NaN in f0.

10.5.3.3 Processing loop

for index, row in df_audio_list.iterrows():
    reference_file_path = os.path.join(reference_dir, f"{reference_file_num}.wav")
    test_file_path      = os.path.join(test_files_dir, test_file_name)

iterrows() yields one (index, Series) pair per spreadsheet row. os.path.join() constructs OS-independent file paths from the directory prefix and filename, handling slash differences across platforms automatically.

10.5.3.4 File existence check

if not os.path.exists(reference_file_path) or not os.path.exists(test_file_path):
    print(f"File not found: ...")
    continue

Guards against missing audio files. Rather than crashing the entire loop, continue skips the current pair and moves to the next row. This is important when a dataset has gaps or filename mismatches.

10.5.3.5 NaN removal

f0_reference = f0_reference[~np.isnan(f0_reference)]
f0_test      = f0_test[~np.isnan(f0_test)]

np.isnan() returns a boolean mask; ~ inverts it; boolean indexing keeps only voiced frames. DTW cannot handle NaN values — leaving them in would propagate nan to the entire distance computation.

10.5.3.6 Z-score normalisation

f0_reference_z = stats.zscore(f0_reference)
f0_test_z      = stats.zscore(f0_test)

scipy.stats.zscore subtracts the mean and divides by the standard deviation, producing a sequence with mean 0 and standard deviation 1. This step is essential because male and female speakers can differ by an octave in absolute F0, yet their intonation patterns may be nearly identical in shape.

10.5.3.7 DTW computation

distance, path = dtw.warping_paths(f0_reference_z, f0_test_z)
best_path = np.array(dtw.best_path(path))

Call	Return	Meaning
`dtw.warping_paths(x, y)`	`(distance, path_matrix)`	Fills the entire \(n \times m\) accumulated cost matrix; `distance` is the bottom-right cell value.
`dtw.best_path(path_matrix)`	List of `(i, j)` tuples	Traces the optimal alignment path from `(0,0)` to `(n-1, m-1)`.
`np.array(...)`	Shape `(L, 2)` array	Converts the path to NumPy for easy column-wise indexing during plotting.

best_path is stored but not used further in this stage; it is essential in Stage 3 for the warping-path overlay on the cost matrix heatmap.

10.5.3.8 Results collection and export

results.append({...})
df_results = pd.DataFrame(results)
df_results.to_excel('DTW_Output/01_Standard_DTW_z.xlsx', index=False)

Results are accumulated in a plain Python list and converted to a DataFrame only once at the end — growing a DataFrame row-by-row inside a loop is significantly slower due to repeated memory allocation. index=False prevents pandas from writing row numbers into the Excel file.

10.5.4 Output columns

Column	Description
`Sent_NO`	Sentence identifier
`Test File`	Student audio filename
`DTW_Original_Distance_z`	DTW distance on z-scored F0 (lower = more similar)
`Sentence`	Sentence transcription

10.6 Stage 2: Derivative DTW (DDTW)

10.6.1 Overview

DDTW compares the shape of pitch contours rather than their absolute values by running DTW on the first-order derivatives of the F0 sequences. This makes the metric less sensitive to overall pitch level and more sensitive to prosodic patterns such as rises, falls, and plateaus.

10.6.2 Full Code

from PIL import Image, ImageDraw, ImageFont
import pandas as pd
import os
import matplotlib.pyplot as plt
import numpy as np
import librosa                 # For audio processing

def compute_derivative(x):
    """Compute the derivative of the time series."""
    derivative = np.zeros(len(x))
    # Compute the central difference for the middle elements
    derivative[1:-1] = (x[2:] - x[:-2]) / 2.0
    # Forward difference for the first element
    derivative[0] = x[1] - x[0]
    # Backward difference for the last element
    derivative[-1] = x[-1] - x[-2]
    return derivative

def ddtw(x, y):
    """Compute the Derivative Dynamic Time Warping (DDTW) distance."""
    # Compute derivatives of the input time series
    x_derivative = compute_derivative(x)
    y_derivative = compute_derivative(y)

    # Initialize the cost matrix
    n, m = len(x), len(y)
    cost_matrix = np.zeros((n, m))

    # Initialize the first row and column of the cost matrix
    cost_matrix[0, 0] = abs(x_derivative[0] - y_derivative[0])
    for i in range(1, n):
        cost_matrix[i, 0] = cost_matrix[i-1, 0] + abs(x_derivative[i] - y_derivative[0])
    for j in range(1, m):
        cost_matrix[0, j] = cost_matrix[0, j-1] + abs(x_derivative[0] - y_derivative[j])

    # Fill the rest of the cost matrix using dynamic programming
    for i in range(1, n):
        for j in range(1, m):
            cost = abs(x_derivative[i] - y_derivative[j])
            cost_matrix[i, j] = cost + min(
                cost_matrix[i-1, j],    # Insertion
                cost_matrix[i, j-1],    # Deletion
                cost_matrix[i-1, j-1]   # Match
            )

    # Backtrack to find the best path
    i, j = n - 1, m - 1
    path = []
    while i > 0 and j > 0:
        path.append((i, j))
        min_cost = min(cost_matrix[i-1, j], cost_matrix[i, j-1], cost_matrix[i-1, j-1])
        if min_cost == cost_matrix[i-1, j-1]:
            i -= 1
            j -= 1
        elif min_cost == cost_matrix[i-1, j]:
            i -= 1
        else:
            j -= 1
    path.append((0, 0))
    path.reverse()

    return cost_matrix[-1, -1], np.array(path)

# Initialize list to store results
results = []

for index, row in df_audio_list.iterrows():
    reference_file_num  = row['Sent_NO']
    reference_file_sent = row['Sentence']
    reference_file_path = os.path.join(reference_dir, f"{reference_file_num}.wav")
    test_file_name      = row['Audio_File']
    test_file_path      = os.path.join(test_files_dir, test_file_name)

    if not os.path.exists(reference_file_path) or not os.path.exists(test_file_path):
        print(f"File not found: {reference_file_path} or {test_file_path}")
        continue

    f0_reference = extract_f0(reference_file_path)
    f0_test      = extract_f0(test_file_path)

    f0_reference = f0_reference[~np.isnan(f0_reference)]
    f0_test      = f0_test[~np.isnan(f0_test)]

    f0_reference_z = stats.zscore(f0_reference)
    f0_test_z      = stats.zscore(f0_test)

    # Perform DDTW and get the best path
    distance, best_path = ddtw(f0_reference_z, f0_test_z)

    results.append({
        'Sent_NO':         reference_file_num,
        'Test File':       test_file_name,
        'DDTW_Distance_z': distance,
        'Sentence':        reference_file_sent
    })

df_results = pd.DataFrame(results)
print(df_results)

excel_file_path = 'DTW_Output/02_DDTW_z.xlsx'
df_results.to_excel(excel_file_path, index=False)
print(f"Results saved to {excel_file_path}")

10.6.3 Explanation

10.6.3.1 `compute_derivative(x)` — finite-difference derivative

def compute_derivative(x):
    derivative = np.zeros(len(x))
    derivative[1:-1] = (x[2:] - x[:-2]) / 2.0  # central difference
    derivative[0]    = x[1] - x[0]              # forward difference
    derivative[-1]   = x[-1] - x[-2]            # backward difference
    return derivative

This function approximates the instantaneous rate of change of the pitch contour at each frame. Three different finite-difference schemes are used depending on position:

Region	Slice notation	Formula	Accuracy
Interior frames	`[1:-1]`	\((x_{i+1} - x_{i-1}) / 2\)	Second-order (symmetric)
First frame	`[0]`	\(x_1 - x_0\)	First-order forward
Last frame	`[-1]`	\(x_{n-1} - x_{n-2}\)	First-order backward

The result is a same-length array where each value encodes whether the pitch was rising (positive), flat (near zero), or falling (negative) at that moment. This shape information is exactly what DDTW compares.

10.6.3.2 `ddtw(x, y)` — custom DDTW implementation

Step 1: Derive both series

x_derivative = compute_derivative(x)
y_derivative = compute_derivative(y)

Converts raw z-scored F0 sequences into slope sequences before DTW is applied. From this point forward, the algorithm is identical to Standard DTW except that it operates on these derivative arrays rather than the raw F0 values.

Step 2: Initialise boundary conditions

cost_matrix[0, 0] = abs(x_derivative[0] - y_derivative[0])
for i in range(1, n):
    cost_matrix[i, 0] = cost_matrix[i-1, 0] + abs(x_derivative[i] - y_derivative[0])
for j in range(1, m):
    cost_matrix[0, j] = cost_matrix[0, j-1] + abs(x_derivative[0] - y_derivative[j])

The first row and column have no predecessors on both axes, so they can only accumulate costs along a single direction. This forces the warping path to start at (0, 0) and creates the boundary for the recurrence.

Step 3: Dynamic programming fill

for i in range(1, n):
    for j in range(1, m):
        cost = abs(x_derivative[i] - y_derivative[j])
        cost_matrix[i, j] = cost + min(
            cost_matrix[i-1, j],    # Insertion: NS advances, student stays
            cost_matrix[i, j-1],    # Deletion:  student advances, NS stays
            cost_matrix[i-1, j-1]   # Match:     both advance together
        )

Each cell’s value is the local derivative distance plus the cheapest way to have arrived at that cell. The three predecessors correspond to three types of alignment moves:

Predecessor	Move	Speech interpretation
`(i-1, j)`	NS advances alone	NS pitch change has no matching student change yet
`(i, j-1)`	Student advances alone	Student pitch change has no matching NS change yet
`(i-1, j-1)`	Both advance	One NS frame matched to one student frame

Step 4: Backtrack the optimal path

i, j = n - 1, m - 1
path = []
while i > 0 and j > 0:
    path.append((i, j))
    min_cost = min(cost_matrix[i-1, j], cost_matrix[i, j-1], cost_matrix[i-1, j-1])
    if   min_cost == cost_matrix[i-1, j-1]: i -= 1; j -= 1
    elif min_cost == cost_matrix[i-1, j]:   i -= 1
    else:                                   j -= 1
path.append((0, 0))
path.reverse()

Starts at the bottom-right corner (n-1, m-1) and greedily steps toward (0, 0) by always choosing the neighbour with the lowest accumulated cost. path.append((0, 0)) manually adds the origin, and path.reverse() re-orders from start to end for consistent downstream use.

Return values

return cost_matrix[-1, -1], np.array(path)

cost_matrix[-1, -1] is the total DDTW distance (the scalar stored in the Excel output). np.array(path) is a 2-D array of shape (L, 2). Note that the full matrix is not returned here — that variant is introduced in Stage 4 for visualisation purposes.

10.6.3.3 Comparison with Standard DTW

Aspect	Standard DTW (Stage 1)	DDTW (Stage 2)
Input to DTW	Raw z-scored F0	First derivative of z-scored F0
Local cost	`\|f0_ref[i] - f0_test[j]\|`	`\|Df0_ref[i] - Df0_test[j]\|`
Sensitivity	Absolute pitch level	Pitch slope (rising/falling)
Implementation	`dtaidistance` C extension	Pure Python nested loop
Output column	`DTW_Original_Distance_z`	`DDTW_Distance_z`

Warning

The pure Python double loop in ddtw() is \(O(n \cdot m)\) in both time and memory. For long utterances this can be slow. If performance is a concern, consider a NumPy-vectorised or Cython implementation.

10.7 Stage 3: DTW Contour Plots

10.7.1 Overview

For every NS-student pair, this stage produces a two-panel figure saved as DTW_Output/Contours/DTW_<student_filename>.png. The left panel overlays the raw pitch contours; the right panel shows the accumulated cost matrix with the optimal warping path.

10.7.2 Full Code

import os
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from dtaidistance import dtw

os.makedirs('DTW_Output/Contours', exist_ok=True)

for index, row in df_audio_list.iterrows():
    reference_file_num  = row['Sent_NO']
    reference_file_sent = row['Sentence']
    reference_file_path = os.path.join(reference_dir, f"{reference_file_num}.wav")
    test_file_name      = row['Audio_File']
    test_file_path      = os.path.join(test_files_dir, test_file_name)

    if not os.path.exists(reference_file_path) or not os.path.exists(test_file_path):
        print(f"File not found: {reference_file_path} or {test_file_path}")
        continue

    # Extract and z-score normalize F0
    f0_ref  = extract_f0(reference_file_path)
    f0_test = extract_f0(test_file_path)
    f0_ref  = f0_ref[~np.isnan(f0_ref)]
    f0_test = f0_test[~np.isnan(f0_test)]
    f0_ref_z  = stats.zscore(f0_ref)
    f0_test_z = stats.zscore(f0_test)

    # Compute DTW accumulated cost matrix and best warping path
    distance, path_matrix = dtw.warping_paths(f0_ref_z, f0_test_z)
    best_path = np.array(dtw.best_path(path_matrix))

    # Clip -inf sentinels used by dtaidistance for display
    disp_matrix = np.where(path_matrix < 0, np.nan, path_matrix)

    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    fig.suptitle(
        f'Standard DTW  |  "{reference_file_sent}"  |  {test_file_name} vs NS {reference_file_num}',
        fontsize=12, fontweight='bold'
    )

    # Left panel: F0 pitch contours
    ax1 = axes[0]
    t_ref  = np.linspace(0, 1, len(f0_ref_z))
    t_test = np.linspace(0, 1, len(f0_test_z))
    ax1.plot(t_ref,  f0_ref_z,  color='steelblue', linewidth=2, label='NS Reference')
    ax1.plot(t_test, f0_test_z, color='tomato',    linewidth=2, label='Student')
    ax1.set_xlabel('Normalized Time')
    ax1.set_ylabel('F0 (z-score)')
    ax1.set_title('F0 Pitch Contours')
    ax1.legend()
    ax1.grid(True, alpha=0.3)

    # Right panel: DTW accumulated cost matrix contour with warping path
    ax2 = axes[1]
    x_idx  = np.arange(disp_matrix.shape[1])
    y_idx  = np.arange(disp_matrix.shape[0])
    masked = np.ma.array(disp_matrix, mask=np.isnan(disp_matrix))
    cf = ax2.contourf(x_idx, y_idx, masked, levels=20, cmap='YlOrRd')
    ax2.plot(best_path[:, 1], best_path[:, 0], 'b-',  linewidth=2,  label='Warping Path')
    ax2.plot(best_path[ 0, 1], best_path[ 0, 0], 'b^', markersize=8)
    ax2.plot(best_path[-1, 1], best_path[-1, 0], 'bo', markersize=8)
    plt.colorbar(cf, ax=ax2, label='Accumulated Cost')
    ax2.set_xlabel('Student Frame Index')
    ax2.set_ylabel('NS Reference Frame Index')
    ax2.set_title(f'DTW Cost Matrix  (distance = {distance:.3f})')
    ax2.legend(loc='lower right')

    plt.tight_layout()
    out_path = f'DTW_Output/Contours/DTW_{test_file_name[:-4]}.png'
    plt.savefig(out_path, dpi=150, bbox_inches='tight')
    plt.show()
    print(f"Saved: {out_path}")

10.7.3 Explanation

10.7.3.1 Output directory creation

os.makedirs('DTW_Output/Contours', exist_ok=True)

Creates the nested output directory if it does not already exist. exist_ok=True prevents an error if the directory is already present, making the script safe to re-run without manual cleanup.

10.7.3.2 Sentinel value removal

disp_matrix = np.where(path_matrix < 0, np.nan, path_matrix)

dtaidistance stores sentinel values of \(-\infty\) in cells that were not visited during computation (e.g., cells outside a Sakoe-Chiba constraint band). These negative values would collapse the colour scale of the contour plot to a single colour, so they are replaced with NaN before display. np.where(condition, x, y) acts as a vectorised conditional: it returns x where the condition is True and y elsewhere.

10.7.3.3 Figure layout

fig, axes = plt.subplots(1, 2, figsize=(14, 5))
fig.suptitle(...)

Creates a 1-row, 2-column figure approximately 14 inches wide and 5 inches tall. fig.suptitle places a shared title above both panels that identifies the sentence text and the exact files being compared.

10.7.3.4 Left panel — pitch contour overlay

t_ref  = np.linspace(0, 1, len(f0_ref_z))
t_test = np.linspace(0, 1, len(f0_test_z))
ax1.plot(t_ref,  f0_ref_z,  color='steelblue', linewidth=2, label='NS Reference')
ax1.plot(t_test, f0_test_z, color='tomato',    linewidth=2, label='Student')

np.linspace(0, 1, n) maps any sequence of length n to the unit interval [0, 1], making different-length contours visually comparable on the same x-axis. Without this normalisation, a longer student recording would appear wider than the NS reference, making shape comparison harder.

The two contours use contrasting colours (steel blue for NS, tomato for student) to allow side-by-side pitch pattern inspection without DTW alignment. Differences here that shrink after alignment (Stage 4 right panel) indicate timing differences, not intonation differences.

10.7.3.5 Right panel — cost matrix heatmap

masked = np.ma.array(disp_matrix, mask=np.isnan(disp_matrix))
cf = ax2.contourf(x_idx, y_idx, masked, levels=20, cmap='YlOrRd')

np.ma.array creates a masked array so NaN cells (sentinels) are rendered transparent rather than assigned a colour. contourf draws 20 filled contour bands using the YlOrRd colour map — yellow for low accumulated cost, dark red for high cost. Low-cost regions are where the two sequences are locally most similar.

ax2.plot(best_path[:, 1], best_path[:, 0], 'b-',  linewidth=2, label='Warping Path')
ax2.plot(best_path[ 0, 1], best_path[ 0, 0], 'b^', markersize=8)   # start triangle
ax2.plot(best_path[-1, 1], best_path[-1, 0], 'bo', markersize=8)   # end circle

best_path[:, 1] are the student (column) indices; best_path[:, 0] are the NS (row) indices. Note the axis order: x-axis = student frames, y-axis = NS frames. The triangle marks where alignment begins; the circle marks where it ends.

Reading the plot: A warping path that hugs the main diagonal indicates that the two speakers are well-aligned in time — similar speech rate and rhythm. Large horizontal excursions mean the student held a syllable while the NS moved on; large vertical excursions mean the opposite.

10.7.3.6 Colour bar and saving

plt.colorbar(cf, ax=ax2, label='Accumulated Cost')

Adds a labelled colour bar on the right of ax2 to decode the numeric cost values from the heatmap colours.

out_path = f'DTW_Output/Contours/DTW_{test_file_name[:-4]}.png'
plt.savefig(out_path, dpi=150, bbox_inches='tight')

test_file_name[:-4] strips the .wav extension so S01_Sent01.wav becomes DTW_S01_Sent01.png. dpi=150 produces medium-resolution output suitable for screen review and standard printing. bbox_inches='tight' trims empty margins around the figure.

10.8 Stage 4: DDTW Contour Plots

10.8.1 Overview

This stage mirrors Stage 3 but for DDTW. It introduces a new helper ddtw_with_matrix() that also returns the full accumulated cost matrix, and extends the left panel to show both raw F0 and its derivative simultaneously via a dual y-axis.

10.8.2 Full Code

import os
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

os.makedirs('DTW_Output/Contours', exist_ok=True)

def ddtw_with_matrix(x, y):
    """DDTW returning (distance, accumulated_cost_matrix, best_path)."""
    x_d = compute_derivative(x)
    y_d = compute_derivative(y)
    n, m = len(x), len(y)
    C = np.zeros((n, m))
    C[0, 0] = abs(x_d[0] - y_d[0])
    for i in range(1, n):
        C[i, 0] = C[i-1, 0] + abs(x_d[i] - y_d[0])
    for j in range(1, m):
        C[0, j] = C[0, j-1] + abs(x_d[0] - y_d[j])
    for i in range(1, n):
        for j in range(1, m):
            C[i, j] = abs(x_d[i] - y_d[j]) + min(C[i-1, j], C[i, j-1], C[i-1, j-1])
    i, j = n - 1, m - 1
    path = []
    while i > 0 and j > 0:
        path.append((i, j))
        mc = min(C[i-1, j], C[i, j-1], C[i-1, j-1])
        if   mc == C[i-1, j-1]: i -= 1; j -= 1
        elif mc == C[i-1, j]:   i -= 1
        else:                   j -= 1
    path.append((0, 0))
    path.reverse()
    return C[-1, -1], C, np.array(path)

for index, row in df_audio_list.iterrows():
    reference_file_num  = row['Sent_NO']
    reference_file_sent = row['Sentence']
    reference_file_path = os.path.join(reference_dir, f"{reference_file_num}.wav")
    test_file_name      = row['Audio_File']
    test_file_path      = os.path.join(test_files_dir, test_file_name)

    if not os.path.exists(reference_file_path) or not os.path.exists(test_file_path):
        print(f"File not found: {reference_file_path} or {test_file_path}")
        continue

    f0_ref  = extract_f0(reference_file_path)
    f0_test = extract_f0(test_file_path)
    f0_ref  = f0_ref[~np.isnan(f0_ref)]
    f0_test = f0_test[~np.isnan(f0_test)]
    f0_ref_z  = stats.zscore(f0_ref)
    f0_test_z = stats.zscore(f0_test)

    # Compute DDTW cost matrix and warping path
    distance, cost_matrix, best_path = ddtw_with_matrix(f0_ref_z, f0_test_z)

    # Derivative series for the left panel
    dx_ref  = compute_derivative(f0_ref_z)
    dx_test = compute_derivative(f0_test_z)

    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    fig.suptitle(
        f'DDTW  |  "{reference_file_sent}"  |  {test_file_name} vs NS {reference_file_num}',
        fontsize=12, fontweight='bold'
    )

    # Left panel: F0 contours + derivative contours (dual y-axis)
    ax1   = axes[0]
    ax1_r = ax1.twinx()
    t_ref  = np.linspace(0, 1, len(f0_ref_z))
    t_test = np.linspace(0, 1, len(f0_test_z))
    ax1.plot(t_ref,  f0_ref_z,  color='steelblue', linewidth=2,   label='NS F0')
    ax1.plot(t_test, f0_test_z, color='tomato',    linewidth=2,   label='Student F0')
    ax1_r.plot(t_ref,  dx_ref,  color='steelblue', linewidth=1.2, linestyle='--', alpha=0.6, label='NS dF0')
    ax1_r.plot(t_test, dx_test, color='tomato',    linewidth=1.2, linestyle='--', alpha=0.6, label='Student dF0')
    ax1.set_xlabel('Normalized Time')
    ax1.set_ylabel('F0 (z-score)')
    ax1_r.set_ylabel('dF0 (derivative)', color='gray')
    ax1.set_title('F0 and dF0 Contours')
    lines1, labels1 = ax1.get_legend_handles_labels()
    lines2, labels2 = ax1_r.get_legend_handles_labels()
    ax1.legend(lines1 + lines2, labels1 + labels2, fontsize=8, loc='upper right')
    ax1.grid(True, alpha=0.3)

    # Right panel: DDTW accumulated cost matrix contour with warping path
    ax2   = axes[1]
    x_idx = np.arange(cost_matrix.shape[1])
    y_idx = np.arange(cost_matrix.shape[0])
    cf = ax2.contourf(x_idx, y_idx, cost_matrix, levels=20, cmap='YlOrRd')
    ax2.plot(best_path[:, 1], best_path[:, 0], 'b-',  linewidth=2,  label='Warping Path')
    ax2.plot(best_path[ 0, 1], best_path[ 0, 0], 'b^', markersize=8)
    ax2.plot(best_path[-1, 1], best_path[-1, 0], 'bo', markersize=8)
    plt.colorbar(cf, ax=ax2, label='Accumulated Cost')
    ax2.set_xlabel('Student Frame Index')
    ax2.set_ylabel('NS Reference Frame Index')
    ax2.set_title(f'DDTW Cost Matrix  (distance = {distance:.3f})')
    ax2.legend(loc='lower right')

    plt.tight_layout()
    out_path = f'DTW_Output/Contours/DDTW_{test_file_name[:-4]}.png'
    plt.savefig(out_path, dpi=150, bbox_inches='tight')
    plt.show()
    print(f"Saved: {out_path}")

10.8.3 Explanation

10.8.3.1 `ddtw_with_matrix(x, y)` — extended DDTW helper

This function is algorithmically identical to ddtw() from Stage 2 but returns three values instead of two:

return C[-1, -1], C, np.array(path)
#       ^^^^^^^^  ^  ^^^^^^^^^^^^
#       distance  full matrix   warping path

The full cost matrix C is needed to render the heatmap in the right panel. In Stage 2, only the scalar distance was required for the Excel output, so C was deliberately not returned there.

10.8.3.2 Computing derivative series for display

dx_ref  = compute_derivative(f0_ref_z)
dx_test = compute_derivative(f0_test_z)

These derivative arrays are computed after z-scoring so their scale is consistent with the z-scored F0 on the primary axis. They are used exclusively for the left-panel visualisation — ddtw_with_matrix internally re-derives them for the cost calculation.

10.8.3.3 Dual y-axis left panel

ax1   = axes[0]
ax1_r = ax1.twinx()

ax1.twinx() creates a second y-axis (right) that shares the same x-axis as ax1 (left). This allows two different value ranges to coexist:

Axis	Lines	Style	Scale label
`ax1` (left y)	Raw z-scored F0	Solid, `linewidth=2`	F0 (z-score)
`ax1_r` (right y)	First-order derivative dF0	Dashed, `alpha=0.6`	dF0 (derivative)

Both axes use the same colours (steel blue = NS, tomato = student), linking the two representations visually. This lets you simultaneously read where the pitch is (solid lines) and whether it is rising or falling (dashed lines) at each moment in the utterance.

10.8.3.4 Combined legend for dual-axis plot

lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax1_r.get_legend_handles_labels()
ax1.legend(lines1 + lines2, labels1 + labels2, fontsize=8, loc='upper right')

By default, a matplotlib legend only captures the lines belonging to its own Axes object. Because ax1 and ax1_r are two separate Axes objects, their handle lists must be retrieved separately and concatenated before passing to ax1.legend(). This produces a single legend box listing all four lines (NS F0, Student F0, NS dF0, Student dF0).

10.8.3.5 Right panel — DDTW cost matrix

cf = ax2.contourf(x_idx, y_idx, cost_matrix, levels=20, cmap='YlOrRd')

Unlike Stage 3, no sentinel masking is needed here because ddtw_with_matrix initialises C with np.zeros() and every cell is populated during the dynamic programming pass — there are no unvisited cells. The warping path overlay, colour bar, axis labels, and saving logic are identical to Stage 3.

10.8.3.6 Interpreting DDTW vs Standard DTW results side-by-side

Observation	Interpretation
Lower DDTW distance than DTW	Student matches the shape of intonation better than the absolute pitch level
Higher DDTW distance than DTW	Student’s pitch level is close but rises and falls occur at different moments
Diagonal warping path (both)	Good temporal alignment — similar speech rate and rhythm
Path deviates near a region	Timing mismatch at that part of the utterance (e.g., different pause length)
High-cost cluster in matrix	The two speakers diverge strongly in that local time region

10.9 Output Summary

File	Content
`DTW_Output/01_Standard_DTW_z.xlsx`	Per-pair Standard DTW distances on z-scored F0
`DTW_Output/02_DDTW_z.xlsx`	Per-pair DDTW distances on z-scored derivative F0
`DTW_Output/Contours/DTW_*.png`	Standard DTW two-panel visualisations
`DTW_Output/Contours/DDTW_*.png`	DDTW two-panel visualisations

10.1 Overview

10.2 Background

10.2.1 Dynamic Time Warping (DTW)

10.2.2 Derivative DTW (DDTW)

10.2.3 F0 Extraction

10.2.4 Z-Score Normalisation

10.3 Dependencies and Installation

10.4 Data Import

10.4.1 Code

10.4.2 Explanation

10.5 Stage 1: Standard DTW

10.5.1 Overview

10.5.2 Full Code

10.5.3 Explanation

10.5.3.1 Import block

10.5.3.2 extract_f0(file_path) — F0 extraction helper

10.5.3.3 Processing loop

10.5.3.4 File existence check

10.5.3.5 NaN removal

10.5.3.6 Z-score normalisation

10.5.3.7 DTW computation

10.5.3.8 Results collection and export

10.5.4 Output columns

10.6 Stage 2: Derivative DTW (DDTW)

10.6.1 Overview

10.6.2 Full Code

10.6.3 Explanation

10.6.3.1 compute_derivative(x) — finite-difference derivative

10.6.3.2 ddtw(x, y) — custom DDTW implementation

10.6.3.3 Comparison with Standard DTW

10.7 Stage 3: DTW Contour Plots

10.7.1 Overview

10.7.2 Full Code

10.7.3 Explanation

10.7.3.1 Output directory creation

10.7.3.2 Sentinel value removal

10.7.3.3 Figure layout

10.7.3.4 Left panel — pitch contour overlay

10.7.3.5 Right panel — cost matrix heatmap

10.7.3.6 Colour bar and saving

10.8 Stage 4: DDTW Contour Plots

10.8.1 Overview

10.8.2 Full Code

10.8.3 Explanation

10.8.3.1 ddtw_with_matrix(x, y) — extended DDTW helper

10.8.3.2 Computing derivative series for display

10.8.3.3 Dual y-axis left panel

10.8.3.4 Combined legend for dual-axis plot

10.8.3.5 Right panel — DDTW cost matrix

10.8.3.6 Interpreting DDTW vs Standard DTW results side-by-side

10.9 Output Summary

10.5.3.2 `extract_f0(file_path)` — F0 extraction helper

10.6.3.1 `compute_derivative(x)` — finite-difference derivative

10.6.3.2 `ddtw(x, y)` — custom DDTW implementation

10.8.3.1 `ddtw_with_matrix(x, y)` — extended DDTW helper