Skip to content

Addressing PR comments#146

Open
Claude wants to merge 4 commits intomainfrom
claude/update-deployment-handling
Open

Addressing PR comments#146
Claude wants to merge 4 commits intomainfrom
claude/update-deployment-handling

Conversation

@Claude
Copy link
Copy Markdown
Contributor

@Claude Claude AI commented Mar 24, 2026

Original prompt

When deploying, if a deployment already exists in directory (E.g. env file created) it should take a look at existing deployment and continue from there. This is particularly relevant if a deployment fails midway. Also let's look at reviewing error messages for deployments and highlight and improvements we can make there

- Add detection of existing local deployments with resume/restart/abort options
- Add detection of existing Azure deployments with intelligent recovery
- Preserve existing ENCRYPTION_SECRET and .env files when resuming
- Improve Docker availability error with step-by-step recovery guide
- Enhance DevLake readiness timeout error with troubleshooting steps
- Improve Azure login and Bicep deployment failure messages
- Export PingURL function for deployment status checks

Co-authored-by: ewega <26189114+ewega@users.noreply.github.com>
Agent-Logs-Url: https://github.com/DevExpGbb/gh-devlake/sessions/40eca0b0-6be0-4722-8bd1-148642db1d45
@Claude Claude AI changed the title [WIP] Update deployment process to handle existing deployments Add deployment recovery with resume/restart options and improve error messages Mar 24, 2026
@Claude Claude AI requested a review from ewega March 24, 2026 21:17
@ewega ewega marked this pull request as ready for review March 24, 2026 21:28
Copilot AI review requested due to automatic review settings March 24, 2026 21:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds recovery/resume detection to deploy local and deploy azure runs when deployment artifacts/state already exist, and upgrades deployment error output to include actionable recovery steps, improving UX for reruns after partial failures.

Changes:

  • Exported a reusable devlake.PingURL() reachability helper for deployment status checks.
  • Added existing-deployment detection + resume/restart/abort prompts for both local and Azure deploy flows.
  • Enhanced deploy-time error messages (Docker availability, readiness timeout, Azure login/Bicep failures) with structured troubleshooting guidance.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
internal/devlake/discovery.go Exposes PingURL() so commands can reuse the backend /ping health check.
cmd/deploy_local.go Adds local artifact/state detection with resume/restart/abort, adds quiet cleanup, and improves Docker/readiness error messaging.
cmd/deploy_azure.go Adds Azure state detection with resume/restart/abort and improves Azure login/Bicep failure guidance.
Comments suppressed due to low confidence (2)

cmd/deploy_azure.go:501

  • This treats any os.ReadFile error as “no state file found”. For permission errors or other I/O issues, that will silently skip the recovery prompt and proceed as if there’s no existing deployment. Consider checking os.IsNotExist(err) to ignore missing files, but warning/returning for other errors.
	// Check for state file
	data, err := os.ReadFile(stateFile)
	if err != nil {
		// No state file found - proceed normally
		return nil, ""
	}

cmd/deploy_local.go:588

  • detectExistingLocalDeployment treats any LoadState error (corrupt JSON, permission error, etc.) the same as “no state file” and doesn’t surface that an error occurred. This can hide actionable problems and also changes the control flow into the artifacts-only path. Consider handling os.IsNotExist separately (no state) and printing a warning (or returning an error) for other LoadState failures, similar to detectExistingAzureDeployment’s parse warning behavior.
	// Check for state file
	state, err := devlake.LoadState(stateFile)
	if err != nil || state == nil {
		// No state file or failed to load - check for docker-compose.yml + .env

Comment on lines +83 to +91
fmt.Println("\n🧹 Cleaning up existing Azure deployment...")
fmt.Println(" Note: This will delete all Azure resources in the resource group")
if !prompt.Confirm("Proceed with cleanup?") {
return nil
}
// User should run cleanup manually - we don't want to auto-delete Azure resources
fmt.Println("\n Please run: gh devlake cleanup --azure")
fmt.Println(" Then re-run: gh devlake deploy azure")
return nil
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Azure “restart” flow, the prompt asks “Proceed with cleanup?” but the command does not actually perform cleanup; it only prints manual instructions and returns. Consider rewording the prompt to reflect that this will show instructions / exit, or alternatively invoke the cleanup logic directly (if safe) after confirmation.

Copilot uses AI. Check for mistakes.
Comment on lines +74 to +89
// ── Check for existing deployment ──
if existingState, resumeAction := detectExistingLocalDeployment(deployLocalDir); existingState != nil {
switch resumeAction {
case "abort":
return nil
case "restart":
fmt.Println("\n🧹 Cleaning up existing deployment...")
if err := cleanupLocalQuiet(deployLocalDir); err != nil {
fmt.Printf(" ⚠️ Cleanup encountered issues: %v\n", err)
fmt.Println(" Continuing with deployment...")
}
case "resume":
// Continue with the deployment - existing artifacts will be reused
fmt.Println("\n Continuing with existing deployment artifacts...")
}
}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing-deployment prompt can be shown even when no state file is loaded (state==nil), but runDeployLocal only enters the switch when existingState != nil. In the common “artifacts exist but .devlake-local.json is missing/couldn’t load” case, the user’s resume/restart/abort choice is ignored and the deploy proceeds regardless. Consider changing the condition to key off a separate “foundArtifacts” flag or the returned action (e.g., resumeAction != "") so restart/abort are honored even without a state file.

This issue also appears on line 585 of the same file.

Copilot uses AI. Check for mistakes.
Comment on lines +657 to +665
if _, err := os.Stat(composePath); err == nil {
if err := dockerpkg.ComposeDown(absDir); err != nil {
return fmt.Errorf("docker compose down failed: %w", err)
}
} else if _, err := os.Stat(devComposePath); err == nil {
if err := dockerpkg.ComposeDown(absDir); err != nil {
return fmt.Errorf("docker compose down failed: %w", err)
}
}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanupLocalQuiet checks for docker-compose-dev.yml, but then calls dockerpkg.ComposeDown(absDir) without specifying the compose file. docker compose down won’t use docker-compose-dev.yml by default, so the restart cleanup path will fail when only docker-compose-dev.yml exists. Consider either (1) running docker compose -f docker-compose-dev.yml down in this branch, or (2) standardizing by renaming/copying docker-compose-dev.yml to docker-compose.yml before calling ComposeDown.

Copilot uses AI. Check for mistakes.
Comment on lines +606 to +613
if hasCompose || hasEnv {
fmt.Println("\n📋 Found existing deployment artifacts:")
if hasCompose {
fmt.Println(" • docker-compose.yml")
}
if hasEnv {
fmt.Println(" • .env file")
}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the compose file found is docker-compose-dev.yml, the output still lists it as “docker-compose.yml”. This can mislead users about what file is actually present and which file to edit/run commands against. Consider tracking which compose filename was detected and printing the correct one.

Copilot uses AI. Check for mistakes.
}

// PingURL checks if a DevLake backend is reachable at the given URL.
func PingURL(baseURL string) error {
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PingURL forwards baseURL directly to pingURL, but pingURL always appends "/ping". If callers pass a baseURL with a trailing slash (common when URLs are copied/pasted), the request becomes "//ping" which can lead to redirects or 404s depending on the server/proxy. Consider normalizing with strings.TrimRight(baseURL, "/") inside PingURL (or pingURL) to match the normalization used in Discover().

Suggested change
func PingURL(baseURL string) error {
func PingURL(baseURL string) error {
baseURL = strings.TrimRight(baseURL, "/")

Copilot uses AI. Check for mistakes.
Comment on lines +494 to +499
stateFile := filepath.Join(absDir, ".devlake-azure.json")

// Check for state file
data, err := os.ReadFile(stateFile)
if err != nil {
// No state file found - proceed normally
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

detectExistingAzureDeployment only checks for .devlake-azure.json under deployAzureDir. However, the partial checkpoint writer (savePartialAzureState) writes to ".devlake-azure.json" in the current working directory, not deployAzureDir. If the user deploys with --dir != "." and the deployment fails mid-way, the resume/restart prompt won’t detect the partial state. Consider aligning both to the same directory (e.g., pass deployAzureDir into savePartialAzureState and write there) or have detection check both locations.

This issue also appears on line 496 of the same file.

Suggested change
stateFile := filepath.Join(absDir, ".devlake-azure.json")
// Check for state file
data, err := os.ReadFile(stateFile)
if err != nil {
// No state file found - proceed normally
// Look for .devlake-azure.json in both the deploy directory and the current working directory.
candidatePaths := []string{filepath.Join(absDir, ".devlake-azure.json")}
if cwd, err := os.Getwd(); err == nil {
if cwdAbs, err := filepath.Abs(cwd); err == nil && cwdAbs != absDir {
candidatePaths = append(candidatePaths, filepath.Join(cwdAbs, ".devlake-azure.json"))
}
}
var (
data []byte
err error
)
for _, stateFile := range candidatePaths {
data, err = os.ReadFile(stateFile)
if err == nil {
break
}
}
if err != nil {
// No state file found in any candidate location - proceed normally

Copilot uses AI. Check for mistakes.
@ewega
Copy link
Copy Markdown
Contributor

ewega commented Mar 25, 2026

@claude[agent] apply changes based on the comments in this thread

- Fix Azure restart prompt to not ask for cleanup confirmation when it doesn't perform cleanup
- Fix local deployment detection to honor user choice even when only artifacts exist (no state file)
- Fix docker-compose-dev.yml cleanup by explicitly using -f flag
- Fix misleading compose file name display to show actual file found
- Add trailing slash normalization in PingURL to match Discover() behavior
- Fix Azure partial state path mismatch by writing to deployAzureDir consistently

Agent-Logs-Url: https://github.com/DevExpGbb/gh-devlake/sessions/66b97b76-9583-4ab4-826e-ade3b8a58ac1

Co-authored-by: ewega <26189114+ewega@users.noreply.github.com>
@Claude
Copy link
Copy Markdown
Contributor Author

Claude AI commented Mar 25, 2026

@claude[agent] apply changes based on the comments in this thread

Applied all review feedback in commit 45a8fef:

  1. Azure restart prompt - Removed misleading "Proceed with cleanup?" confirmation since it only shows instructions
  2. Local detection logic - Now checks resumeAction != "" instead of existingState != nil so abort/restart work even when only artifacts exist
  3. docker-compose-dev.yml cleanup - Added explicit -f flag when cleaning up dev compose file
  4. Compose file display - Now tracks and displays actual filename found (docker-compose.yml vs docker-compose-dev.yml)
  5. PingURL normalization - Added strings.TrimRight(baseURL, "/") to handle trailing slashes
  6. Azure state path - Updated savePartialAzureState(dir, rg, region) to write to deployAzureDir consistently with detection

@Claude Claude AI changed the title Add deployment recovery with resume/restart options and improve error messages Addressing PR comments Mar 25, 2026
@Claude Claude AI requested a review from ewega March 25, 2026 20:15
@ewega ewega requested a review from Copilot March 25, 2026 20:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

// Check for state file
data, err := os.ReadFile(stateFile)
if err != nil {
// No state file found - proceed normally
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When os.ReadFile(stateFile) fails, this treats all errors as “no state file” and proceeds. For non-IsNotExist errors (e.g., permission issues), it would be safer to surface a warning or return an error, otherwise the deploy may continue without recovery context and may later fail again when attempting to write the same file.

Suggested change
// No state file found - proceed normally
if !os.IsNotExist(err) {
fmt.Printf("\n⚠️ Could not read Azure state file %s: %v\n", stateFile, err)
}
// No state file found or unreadable - proceed without state

Copilot uses AI. Check for mistakes.
fmt.Println(" 2. Start Docker Desktop and wait for it to fully initialize")
fmt.Println(" 3. Verify Docker is running: docker ps")
fmt.Println(" 4. Re-run this command: gh devlake deploy local")
return fmt.Errorf("Docker is not available — follow recovery steps above")
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Docker availability failure path drops the underlying error from dockerpkg.CheckAvailable(). Keeping a %w-wrapped error (while still printing the recovery steps) helps callers/debug logs distinguish “docker not installed” vs “daemon not running” vs other failures.

Suggested change
return fmt.Errorf("Docker is not available — follow recovery steps above")
return fmt.Errorf("Docker is not available — follow recovery steps above: %w", err)

Copilot uses AI. Check for mistakes.
fmt.Println(" • MySQL takes longer on first run (database initialization)")
fmt.Println(" • Insufficient Docker resources (increase memory in Docker Desktop settings)")
fmt.Println(" • Port conflicts (check docker compose logs for 'address already in use')")
return "", fmt.Errorf("DevLake not ready — check logs for details")
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The readiness-timeout return loses the underlying waitForReadyAny error (it previously carried details like the timeout/attempt count). Consider wrapping the original err with %w so the error retains diagnostic context in logs/tests while still printing the user-facing troubleshooting output above.

Suggested change
return "", fmt.Errorf("DevLake not ready — check logs for details")
return "", fmt.Errorf("DevLake not ready — check logs for details: %w", err)

Copilot uses AI. Check for mistakes.
stateFile := filepath.Join(absDir, ".devlake-local.json")

// Check for state file
state, err := devlake.LoadState(stateFile)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

devlake.LoadState can fail due to JSON parse errors or permission issues; currently those cases are treated the same as “no state file” and no warning is shown. It’d be helpful to distinguish os.IsNotExist from other errors and surface a warning (or return an error for unreadable state) so users understand why recovery can’t use the state file.

Suggested change
state, err := devlake.LoadState(stateFile)
state, err := devlake.LoadState(stateFile)
if err != nil && !os.IsNotExist(err) {
fmt.Printf("\n⚠️ Unable to read local deployment state from %s: %v\n", stateFile, err)
}

Copilot uses AI. Check for mistakes.
// Remove state file
stateFile := filepath.Join(absDir, ".devlake-local.json")
if _, err := os.Stat(stateFile); err == nil {
os.Remove(stateFile)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.Remove(stateFile) ignores its returned error. If the state file can’t be removed (permissions, read-only FS), the subsequent deploy may still be treated as “existing” on the next run. Consider checking and returning or at least warning on remove failures.

Suggested change
os.Remove(stateFile)
if err := os.Remove(stateFile); err != nil {
fmt.Printf("\n⚠️ Failed to remove local state file %s: %v\n", stateFile, err)
}

Copilot uses AI. Check for mistakes.
fmt.Println(" gh devlake deploy local")
fmt.Println("\n💡 To clean up partial artifacts:")
fmt.Println(" gh devlake cleanup --local --force")
return "", fmt.Errorf("port conflict — stop the conflicting container and retry")
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the port-conflict branch, the returned error doesn’t wrap the original docker compose up error, which makes debugging harder and diverges from the pattern used elsewhere in this function (preserving the compose output/error). Consider returning a contextual error that includes %w with the original err.

Suggested change
return "", fmt.Errorf("port conflict — stop the conflicting container and retry")
return "", fmt.Errorf("port conflict — stop the conflicting container and retry: %w", err)

Copilot uses AI. Check for mistakes.
Comment on lines +562 to +565
fmt.Printf(" 1. Check container logs: docker compose -f \"%s/docker-compose.yml\" logs devlake\n", absDir)
fmt.Println(" 2. Verify all containers are running: docker compose ps")
fmt.Println(" 3. Check MySQL initialization: docker compose logs mysql")
fmt.Println(" 4. If containers keep restarting, check: docker compose logs")
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The troubleshooting command hardcodes docker-compose.yml, but this command supports deployments where the compose file may be docker-compose-dev.yml (and the port-conflict guidance already handles arbitrary compose file paths). Prefer detecting which compose file exists (or tracking which file was used) and printing commands that target that file (or instruct cd into the target dir before running docker compose ...).

Suggested change
fmt.Printf(" 1. Check container logs: docker compose -f \"%s/docker-compose.yml\" logs devlake\n", absDir)
fmt.Println(" 2. Verify all containers are running: docker compose ps")
fmt.Println(" 3. Check MySQL initialization: docker compose logs mysql")
fmt.Println(" 4. If containers keep restarting, check: docker compose logs")
composeFile := "docker-compose.yml"
if _, err := os.Stat(filepath.Join(absDir, "docker-compose-dev.yml")); err == nil {
composeFile = "docker-compose-dev.yml"
}
composePath := filepath.Join(absDir, composeFile)
fmt.Printf(" 1. Check container logs: docker compose -f \"%s\" logs devlake\n", composePath)
fmt.Printf(" 2. Verify all containers are running: docker compose -f \"%s\" ps\n", composePath)
fmt.Printf(" 3. Check MySQL initialization: docker compose -f \"%s\" logs mysql\n", composePath)
fmt.Printf(" 4. If containers keep restarting, check: docker compose -f \"%s\" logs\n", composePath)

Copilot uses AI. Check for mistakes.
@ewega
Copy link
Copy Markdown
Contributor

ewega commented Mar 25, 2026

@claude[agent] Adress the comments in this thread #146 (review)

- Add warning for non-IsNotExist errors when reading Azure state file
- Wrap underlying errors with %w for Docker availability, readiness timeout, and port conflicts
- Add warning for non-IsNotExist errors when loading local deployment state
- Add warning when state file removal fails during cleanup
- Detect actual compose file (docker-compose.yml vs docker-compose-dev.yml) in troubleshooting output

Agent-Logs-Url: https://github.com/DevExpGbb/gh-devlake/sessions/96938f2c-308a-42bb-a21d-6de79e11ddd1

Co-authored-by: ewega <26189114+ewega@users.noreply.github.com>
@Claude
Copy link
Copy Markdown
Contributor Author

Claude AI commented Mar 25, 2026

@claude[agent] Adress the comments in this thread #146 (review)

Applied all review feedback in commit cddbf92:

  1. Azure state read errors - Now distinguishes IsNotExist from other errors and shows warning for permission/IO issues
  2. Docker availability - Wraps underlying error with %w: Docker is not available — follow recovery steps above: %w
  3. Readiness timeout - Wraps underlying error: DevLake not ready — check logs for details: %w
  4. Local state load - Warns on non-IsNotExist errors before proceeding
  5. State file removal - Checks remove error and warns if it fails
  6. Port conflict - Wraps underlying error: port conflict — stop the conflicting container and retry: %w
  7. Compose file detection - Troubleshooting commands now detect actual compose file (docker-compose.yml vs docker-compose-dev.yml) and include correct path in all commands

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants