Skip to content

Output Files

Understanding the files generated by LMM-Vibes analysis.

When you run explain() or label() with an output_dir, LMM-Vibes saves comprehensive results across multiple file formats. This guide explains each output file and how to use them.

Core Output Files

1. clustered_results.parquet

Primary results file with all analysis data

This is the main output file containing your original data enriched with extracted properties and cluster assignments.

import pandas as pd

# Load complete results
df = pd.read_parquet("results/clustered_results.parquet")

# Key columns added by LMM-Vibes:
print(df.columns)
# ['question_id', 'model', 'model_response',           # Original data
#  'property_description', 'property_evidence',       # Extracted properties  
#  'property_description_cluster_id',                  # Cluster assignments
#  'property_description_cluster_label',               # Human-readable cluster names
#  'property_description_coarse_cluster_id',           # Hierarchical clusters (if enabled)
#  'property_description_coarse_cluster_label']        # Coarse cluster names

# Example: Find all responses in a specific cluster
cluster_data = df[df['property_description_cluster_label'] == 'Detailed Technical Explanations']

Use this file for: - Interactive analysis and visualization - Building custom dashboards - Statistical analysis of results - Feeding into downstream ML pipelines

2. model_stats.json

Model performance metrics and rankings

Contains statistical summaries and performance metrics for each model analyzed.

import json

# Load model statistics
with open("results/model_stats.json", "r") as f:
    model_stats = json.load(f)

# Structure depends on your analysis:
print(model_stats.keys())

# For functional metrics format:
functional_metrics = model_stats["functional_metrics"]
model_scores = functional_metrics["model_scores"]
cluster_scores = functional_metrics["cluster_scores"]

# Example: Get quality scores for each model
for model_name, model_data in model_scores.items():
    quality_scores = model_data["quality"]
    print(f"{model_name}: {quality_scores}")

Use this file for: - Model leaderboards and rankings - Performance comparisons - Quality assessment reports - Automated model selection

3. full_dataset.json

Complete dataset for reanalysis and caching

Contains the entire PropertyDataset object with all conversations, properties, clusters, and metadata.

from lmmvibes.core.data_objects import PropertyDataset

# Load complete dataset
dataset = PropertyDataset.load("results/full_dataset.json")

# Access all components:
print(f"Conversations: {len(dataset.conversations)}")
print(f"Properties: {len(dataset.properties)}")  
print(f"Clusters: {len(dataset.clusters)}")
print(f"Models: {dataset.all_models}")

# Rerun metrics with different parameters
from lmmvibes import compute_metrics_only
clustered_df, new_stats = compute_metrics_only(
    "results/full_dataset.json",
    method="single_model",
    output_dir="results_updated/"
)

Use this file for: - Recomputing metrics without re-extracting properties - Debugging and troubleshooting - Building analysis pipelines - Sharing complete analysis state

Additional Output Files

Processing Stage Files

Property Extraction: - raw_properties.jsonl - Raw LLM responses before parsing - extraction_stats.json - API call statistics and timing - extraction_samples.jsonl - Sample inputs/outputs for debugging

JSON Parsing: - parsed_properties.jsonl - Successfully parsed property objects - parsing_stats.json - Parsing success/failure statistics
- parsing_failures.jsonl - Failed parsing attempts for debugging

Validation: - validated_properties.jsonl - Properties that passed validation - validation_stats.json - Validation statistics

Clustering: - embeddings.parquet - Property embeddings (if include_embeddings=True) - clustered_results_lightweight.jsonl - Results without embeddings - summary_table.jsonl - Cluster summary statistics

Metrics: - model_cluster_scores.json - Per model-cluster performance - cluster_scores.json - Aggregate cluster metrics - model_scores.json - Aggregate model metrics

Summary Files

summary.txt - Human-readable analysis summary

LMM-Vibes Results Summary
==================================================

Total conversations: 1,234
Total properties: 4,567  
Models analyzed: 8
Fine clusters: 23
Coarse clusters: 8

Model Rankings (by average quality score):
  1. gpt-4: 0.847
  2. claude-3: 0.832
  3. gemini-pro: 0.801
  ...

Working with Output Files

Loading Results for Analysis

import pandas as pd
import json

# Quick analysis workflow
df = pd.read_parquet("results/clustered_results.parquet")
with open("results/model_stats.json") as f:
    stats = json.load(f)

# Analyze cluster distributions
cluster_counts = df['property_description_cluster_label'].value_counts()
print("Top behavioral patterns:")
print(cluster_counts.head(10))

# Compare models within specific clusters  
for cluster in cluster_counts.head(5).index:
    cluster_data = df[df['property_description_cluster_label'] == cluster]
    model_dist = cluster_data['model'].value_counts()
    print(f"\n{cluster}:")
    print(model_dist)

Rerunning Analysis

from lmmvibes import compute_metrics_only

# Recompute metrics with different parameters
clustered_df, model_stats = compute_metrics_only(
    input_path="results/full_dataset.json",
    method="single_model", 
    metrics_kwargs={
        'compute_confidence_intervals': True,
        'bootstrap_samples': 1000
    },
    output_dir="results_with_ci/"
)

Building Dashboards

# Interactive visualization
import streamlit as st

# Load results
@st.cache_data
def load_data():
    return pd.read_parquet("results/clustered_results.parquet")

df = load_data()

# Build interactive filters
selected_models = st.multiselect("Select Models", df['model'].unique())
selected_clusters = st.multiselect("Select Clusters", df['property_description_cluster_label'].unique())

# Filter and display
filtered_df = df[
    (df['model'].isin(selected_models)) & 
    (df['property_description_cluster_label'].isin(selected_clusters))
]
st.dataframe(filtered_df)

File Format Details

Parquet vs JSON vs JSONL

Parquet (.parquet) - Binary format, fastest loading - Preserves data types - Best for analysis and large datasets - Use: pd.read_parquet()

JSON (.json) - Human-readable structure - Good for configuration and metadata - Use: json.load()

JSONL (.jsonl)
- Newline-delimited JSON - Streamable for large datasets - Each line is a JSON object - Use: pd.read_json(..., lines=True)

Best Practices

1. File Organization

results/
├── clustered_results.parquet      # Primary analysis file
├── model_stats.json               # Model rankings  
├── full_dataset.json              # Complete state
├── summary.txt                    # Human summary
├── embeddings.parquet             # Embeddings (optional)
└── stage_outputs/                 # Detailed processing files
    ├── parsed_properties.jsonl
    ├── validation_stats.json
    └── ...

2. Version Control

  • Include summary.txt and model_stats.json in version control
  • Use .gitignore for large binary files like embeddings
  • Tag important analysis runs

3. Reproducibility

  • Save the exact command/parameters used
  • Keep full_dataset.json for reanalysis
  • Document any post-processing steps

Next Steps