A three-level framework for evaluating LLM-based agents in Model-Driven Engineering (MDE) contexts.
Install dependencies:
pip install -r requirements.txtSet OpenAI API key in .env file at project root:
OPENAI_API_KEY=your_api_key_hereThe megamodel provides descriptions of LLM-based agents, associated tools, underlying artifacts/models, and execution traces. It is organized in four parts: core artifacts (models, metamodels, transformation models), tooling artifacts (tools, servers), agent artifacts (agents, workflows, steps), and execution traces (traces, invocations).
- Implementation:
src/core/megamodel.py - Population: The megamodel is populated at agent initialization via
populate_registry()(seescripts/run_agent_versions.py)
Extracts megamodel structure from LLM execution traces to populate the registry with tool capabilities and transformation metadata.
- ATL Log Processing:
logs_injection/ATL/extract_megamodel_structure.py- Extracts megamodel entities from ATL tracescomplete_langsmith_trace.txt- Raw LangSmith trace datalangsmith_final_output.json- Processed trace output
- EMF Log Processing:
logs_injection/EMF/extract_emf_megamodel_structure.py- Extracts megamodel entities from EMF tracesemf_complete_trace_final.txt- Raw EMF trace dataemf_langsmith_final_output.json- Processed trace output
Generates synthetic instruction-API datasets from domain-specific seeds using LLM augmentation.
- Generation Scripts:
dataset generation/generation_process/ATL/pipeline.py- ATL dataset generation pipelineATL/single_tool_generate.py- Single-tool instruction generationATL/multi_tool_generate.py- Multi-tool instruction generationEMF/emf_pipeline.py- EMF dataset generation pipelineEMF/single_tool_generate.py- EMF single-tool instruction generationEMF/emf_multi_tool_generate.py- EMF multi-tool instruction generation
- Seed Instructions:
dataset generation/seeds/model_management_seeds/- Model management operation seedsmodel_transformation_seeds/- Transformation operation seeds
- Generated Datasets:
dataset generation/outputs/atl_tools/- ATL transformation datasetsuml/- UML transformation datasetsopenRewrite/- OpenRewrite refactoring datasets
Validates dataset diversity using six metrics from dataset augmentation research.
- Analysis Script:
dataset generation/analyze_dataset_diversity.py - Visualization:
dataset generation/visualize_metrics.py - Metrics Calculated:
- Distance (average pairwise Euclidean distance)
- Dispersion (1 - average cosine similarity)
- Isocontour Radius (geometric mean of per-dimension standard deviations)
- Affinity (similarity between seed and augmented dataset means)
- Vocabulary Size (unique words)
- Unique 3-grams (distinct 3-word sequences)
- Output Charts:
dataset generation/experimentation_charts/atl_tools/- ATL diversity metric chartsuml_tools/- UML diversity metric chartsopenrewrite_tools/- OpenRewrite diversity metric charts
Tests agent performance with reduced tool availability.
- Script:
scripts/run_agent_reduced_tools.py - Analysis:
dataset generation/ablation_test/instruction_analysis.py - Results:
dataset generation/ablation_test/ - Coverage Charts:
dataset generation/ablation_test/plots/
Evaluates agent versions against generated datasets to measure accuracy and performance.
- Agent Versions:
regression_testing/agent_versions/(agent1.py through agent7.py) - Execution Script:
scripts/run_agent_versions.py - Results:
regression_testing/agent_version_logs/version_1/throughversion_7/- Execution logs per agent versionreport_generation.csv- Augmented dataset resultsseeds_report_generation.csv- Seed dataset results
- Evaluation:
regression_testing/evaluate_accuracy.py - Visualization:
regression_testing/visualize_accuracy_comparison.py - Output Plots:
regression_testing/plots/
Servers expose tools via the Model Context Protocol (MCP) for agent execution.
- ATL Server:
mcp_servers/atl_server/- Model transformations (includes UML transformations) - EMF Server:
mcp_servers/emf_server/- EMF model management operations - OpenRewrite Server:
mcp_servers/openRewrite_servers/- Java code refactoring and migration recipes
