Output Files
Understanding the files generated by LMM-Vibes analysis.
When you run explain()
or label()
with an output_dir
, LMM-Vibes saves comprehensive results across multiple file formats. This guide explains each output file and how to use them.
Core Output Files
1. clustered_results.parquet
Primary results file with all analysis data
This is the main output file containing your original data enriched with extracted properties and cluster assignments.
import pandas as pd
# Load complete results
df = pd.read_parquet("results/clustered_results.parquet")
# Key columns added by LMM-Vibes:
print(df.columns)
# ['question_id', 'model', 'model_response', # Original data
# 'property_description', 'property_evidence', # Extracted properties
# 'property_description_cluster_id', # Cluster assignments
# 'property_description_cluster_label', # Human-readable cluster names
# 'property_description_coarse_cluster_id', # Hierarchical clusters (if enabled)
# 'property_description_coarse_cluster_label'] # Coarse cluster names
# Example: Find all responses in a specific cluster
cluster_data = df[df['property_description_cluster_label'] == 'Detailed Technical Explanations']
Use this file for: - Interactive analysis and visualization - Building custom dashboards - Statistical analysis of results - Feeding into downstream ML pipelines
2. model_stats.json
Model performance metrics and rankings
Contains statistical summaries and performance metrics for each model analyzed.
import json
# Load model statistics
with open("results/model_stats.json", "r") as f:
model_stats = json.load(f)
# Structure depends on your analysis:
print(model_stats.keys())
# For functional metrics format:
functional_metrics = model_stats["functional_metrics"]
model_scores = functional_metrics["model_scores"]
cluster_scores = functional_metrics["cluster_scores"]
# Example: Get quality scores for each model
for model_name, model_data in model_scores.items():
quality_scores = model_data["quality"]
print(f"{model_name}: {quality_scores}")
Use this file for: - Model leaderboards and rankings - Performance comparisons - Quality assessment reports - Automated model selection
3. full_dataset.json
Complete dataset for reanalysis and caching
Contains the entire PropertyDataset
object with all conversations, properties, clusters, and metadata.
from lmmvibes.core.data_objects import PropertyDataset
# Load complete dataset
dataset = PropertyDataset.load("results/full_dataset.json")
# Access all components:
print(f"Conversations: {len(dataset.conversations)}")
print(f"Properties: {len(dataset.properties)}")
print(f"Clusters: {len(dataset.clusters)}")
print(f"Models: {dataset.all_models}")
# Rerun metrics with different parameters
from lmmvibes import compute_metrics_only
clustered_df, new_stats = compute_metrics_only(
"results/full_dataset.json",
method="single_model",
output_dir="results_updated/"
)
Use this file for: - Recomputing metrics without re-extracting properties - Debugging and troubleshooting - Building analysis pipelines - Sharing complete analysis state
Additional Output Files
Processing Stage Files
Property Extraction:
- raw_properties.jsonl
- Raw LLM responses before parsing
- extraction_stats.json
- API call statistics and timing
- extraction_samples.jsonl
- Sample inputs/outputs for debugging
JSON Parsing:
- parsed_properties.jsonl
- Successfully parsed property objects
- parsing_stats.json
- Parsing success/failure statistics
- parsing_failures.jsonl
- Failed parsing attempts for debugging
Validation:
- validated_properties.jsonl
- Properties that passed validation
- validation_stats.json
- Validation statistics
Clustering:
- embeddings.parquet
- Property embeddings (if include_embeddings=True
)
- clustered_results_lightweight.jsonl
- Results without embeddings
- summary_table.jsonl
- Cluster summary statistics
Metrics:
- model_cluster_scores.json
- Per model-cluster performance
- cluster_scores.json
- Aggregate cluster metrics
- model_scores.json
- Aggregate model metrics
Summary Files
summary.txt
- Human-readable analysis summary
LMM-Vibes Results Summary
==================================================
Total conversations: 1,234
Total properties: 4,567
Models analyzed: 8
Fine clusters: 23
Coarse clusters: 8
Model Rankings (by average quality score):
1. gpt-4: 0.847
2. claude-3: 0.832
3. gemini-pro: 0.801
...
Working with Output Files
Loading Results for Analysis
import pandas as pd
import json
# Quick analysis workflow
df = pd.read_parquet("results/clustered_results.parquet")
with open("results/model_stats.json") as f:
stats = json.load(f)
# Analyze cluster distributions
cluster_counts = df['property_description_cluster_label'].value_counts()
print("Top behavioral patterns:")
print(cluster_counts.head(10))
# Compare models within specific clusters
for cluster in cluster_counts.head(5).index:
cluster_data = df[df['property_description_cluster_label'] == cluster]
model_dist = cluster_data['model'].value_counts()
print(f"\n{cluster}:")
print(model_dist)
Rerunning Analysis
from lmmvibes import compute_metrics_only
# Recompute metrics with different parameters
clustered_df, model_stats = compute_metrics_only(
input_path="results/full_dataset.json",
method="single_model",
metrics_kwargs={
'compute_confidence_intervals': True,
'bootstrap_samples': 1000
},
output_dir="results_with_ci/"
)
Building Dashboards
# Interactive visualization
import streamlit as st
# Load results
@st.cache_data
def load_data():
return pd.read_parquet("results/clustered_results.parquet")
df = load_data()
# Build interactive filters
selected_models = st.multiselect("Select Models", df['model'].unique())
selected_clusters = st.multiselect("Select Clusters", df['property_description_cluster_label'].unique())
# Filter and display
filtered_df = df[
(df['model'].isin(selected_models)) &
(df['property_description_cluster_label'].isin(selected_clusters))
]
st.dataframe(filtered_df)
File Format Details
Parquet vs JSON vs JSONL
Parquet (.parquet
)
- Binary format, fastest loading
- Preserves data types
- Best for analysis and large datasets
- Use: pd.read_parquet()
JSON (.json
)
- Human-readable structure
- Good for configuration and metadata
- Use: json.load()
JSONL (.jsonl
)
- Newline-delimited JSON
- Streamable for large datasets
- Each line is a JSON object
- Use: pd.read_json(..., lines=True)
Best Practices
1. File Organization
results/
├── clustered_results.parquet # Primary analysis file
├── model_stats.json # Model rankings
├── full_dataset.json # Complete state
├── summary.txt # Human summary
├── embeddings.parquet # Embeddings (optional)
└── stage_outputs/ # Detailed processing files
├── parsed_properties.jsonl
├── validation_stats.json
└── ...
2. Version Control
- Include
summary.txt
andmodel_stats.json
in version control - Use
.gitignore
for large binary files like embeddings - Tag important analysis runs
3. Reproducibility
- Save the exact command/parameters used
- Keep
full_dataset.json
for reanalysis - Document any post-processing steps
Next Steps
- Use the quickstart guide to generate these files
- Learn about explain() and label() functions
- Explore visualization options