Conversation
- Add detection of existing local deployments with resume/restart/abort options - Add detection of existing Azure deployments with intelligent recovery - Preserve existing ENCRYPTION_SECRET and .env files when resuming - Improve Docker availability error with step-by-step recovery guide - Enhance DevLake readiness timeout error with troubleshooting steps - Improve Azure login and Bicep deployment failure messages - Export PingURL function for deployment status checks Co-authored-by: ewega <26189114+ewega@users.noreply.github.com> Agent-Logs-Url: https://github.com/DevExpGbb/gh-devlake/sessions/40eca0b0-6be0-4722-8bd1-148642db1d45
There was a problem hiding this comment.
Pull request overview
Adds recovery/resume detection to deploy local and deploy azure runs when deployment artifacts/state already exist, and upgrades deployment error output to include actionable recovery steps, improving UX for reruns after partial failures.
Changes:
- Exported a reusable
devlake.PingURL()reachability helper for deployment status checks. - Added existing-deployment detection + resume/restart/abort prompts for both local and Azure deploy flows.
- Enhanced deploy-time error messages (Docker availability, readiness timeout, Azure login/Bicep failures) with structured troubleshooting guidance.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| internal/devlake/discovery.go | Exposes PingURL() so commands can reuse the backend /ping health check. |
| cmd/deploy_local.go | Adds local artifact/state detection with resume/restart/abort, adds quiet cleanup, and improves Docker/readiness error messaging. |
| cmd/deploy_azure.go | Adds Azure state detection with resume/restart/abort and improves Azure login/Bicep failure guidance. |
Comments suppressed due to low confidence (2)
cmd/deploy_azure.go:501
- This treats any os.ReadFile error as “no state file found”. For permission errors or other I/O issues, that will silently skip the recovery prompt and proceed as if there’s no existing deployment. Consider checking os.IsNotExist(err) to ignore missing files, but warning/returning for other errors.
// Check for state file
data, err := os.ReadFile(stateFile)
if err != nil {
// No state file found - proceed normally
return nil, ""
}
cmd/deploy_local.go:588
- detectExistingLocalDeployment treats any LoadState error (corrupt JSON, permission error, etc.) the same as “no state file” and doesn’t surface that an error occurred. This can hide actionable problems and also changes the control flow into the artifacts-only path. Consider handling os.IsNotExist separately (no state) and printing a warning (or returning an error) for other LoadState failures, similar to detectExistingAzureDeployment’s parse warning behavior.
// Check for state file
state, err := devlake.LoadState(stateFile)
if err != nil || state == nil {
// No state file or failed to load - check for docker-compose.yml + .env
cmd/deploy_azure.go
Outdated
| fmt.Println("\n🧹 Cleaning up existing Azure deployment...") | ||
| fmt.Println(" Note: This will delete all Azure resources in the resource group") | ||
| if !prompt.Confirm("Proceed with cleanup?") { | ||
| return nil | ||
| } | ||
| // User should run cleanup manually - we don't want to auto-delete Azure resources | ||
| fmt.Println("\n Please run: gh devlake cleanup --azure") | ||
| fmt.Println(" Then re-run: gh devlake deploy azure") | ||
| return nil |
There was a problem hiding this comment.
In the Azure “restart” flow, the prompt asks “Proceed with cleanup?” but the command does not actually perform cleanup; it only prints manual instructions and returns. Consider rewording the prompt to reflect that this will show instructions / exit, or alternatively invoke the cleanup logic directly (if safe) after confirmation.
| // ── Check for existing deployment ── | ||
| if existingState, resumeAction := detectExistingLocalDeployment(deployLocalDir); existingState != nil { | ||
| switch resumeAction { | ||
| case "abort": | ||
| return nil | ||
| case "restart": | ||
| fmt.Println("\n🧹 Cleaning up existing deployment...") | ||
| if err := cleanupLocalQuiet(deployLocalDir); err != nil { | ||
| fmt.Printf(" ⚠️ Cleanup encountered issues: %v\n", err) | ||
| fmt.Println(" Continuing with deployment...") | ||
| } | ||
| case "resume": | ||
| // Continue with the deployment - existing artifacts will be reused | ||
| fmt.Println("\n Continuing with existing deployment artifacts...") | ||
| } | ||
| } |
There was a problem hiding this comment.
The existing-deployment prompt can be shown even when no state file is loaded (state==nil), but runDeployLocal only enters the switch when existingState != nil. In the common “artifacts exist but .devlake-local.json is missing/couldn’t load” case, the user’s resume/restart/abort choice is ignored and the deploy proceeds regardless. Consider changing the condition to key off a separate “foundArtifacts” flag or the returned action (e.g., resumeAction != "") so restart/abort are honored even without a state file.
This issue also appears on line 585 of the same file.
| if _, err := os.Stat(composePath); err == nil { | ||
| if err := dockerpkg.ComposeDown(absDir); err != nil { | ||
| return fmt.Errorf("docker compose down failed: %w", err) | ||
| } | ||
| } else if _, err := os.Stat(devComposePath); err == nil { | ||
| if err := dockerpkg.ComposeDown(absDir); err != nil { | ||
| return fmt.Errorf("docker compose down failed: %w", err) | ||
| } | ||
| } |
There was a problem hiding this comment.
cleanupLocalQuiet checks for docker-compose-dev.yml, but then calls dockerpkg.ComposeDown(absDir) without specifying the compose file. docker compose down won’t use docker-compose-dev.yml by default, so the restart cleanup path will fail when only docker-compose-dev.yml exists. Consider either (1) running docker compose -f docker-compose-dev.yml down in this branch, or (2) standardizing by renaming/copying docker-compose-dev.yml to docker-compose.yml before calling ComposeDown.
| if hasCompose || hasEnv { | ||
| fmt.Println("\n📋 Found existing deployment artifacts:") | ||
| if hasCompose { | ||
| fmt.Println(" • docker-compose.yml") | ||
| } | ||
| if hasEnv { | ||
| fmt.Println(" • .env file") | ||
| } |
There was a problem hiding this comment.
When the compose file found is docker-compose-dev.yml, the output still lists it as “docker-compose.yml”. This can mislead users about what file is actually present and which file to edit/run commands against. Consider tracking which compose filename was detected and printing the correct one.
| } | ||
|
|
||
| // PingURL checks if a DevLake backend is reachable at the given URL. | ||
| func PingURL(baseURL string) error { |
There was a problem hiding this comment.
PingURL forwards baseURL directly to pingURL, but pingURL always appends "/ping". If callers pass a baseURL with a trailing slash (common when URLs are copied/pasted), the request becomes "//ping" which can lead to redirects or 404s depending on the server/proxy. Consider normalizing with strings.TrimRight(baseURL, "/") inside PingURL (or pingURL) to match the normalization used in Discover().
| func PingURL(baseURL string) error { | |
| func PingURL(baseURL string) error { | |
| baseURL = strings.TrimRight(baseURL, "/") |
cmd/deploy_azure.go
Outdated
| stateFile := filepath.Join(absDir, ".devlake-azure.json") | ||
|
|
||
| // Check for state file | ||
| data, err := os.ReadFile(stateFile) | ||
| if err != nil { | ||
| // No state file found - proceed normally |
There was a problem hiding this comment.
detectExistingAzureDeployment only checks for .devlake-azure.json under deployAzureDir. However, the partial checkpoint writer (savePartialAzureState) writes to ".devlake-azure.json" in the current working directory, not deployAzureDir. If the user deploys with --dir != "." and the deployment fails mid-way, the resume/restart prompt won’t detect the partial state. Consider aligning both to the same directory (e.g., pass deployAzureDir into savePartialAzureState and write there) or have detection check both locations.
This issue also appears on line 496 of the same file.
| stateFile := filepath.Join(absDir, ".devlake-azure.json") | |
| // Check for state file | |
| data, err := os.ReadFile(stateFile) | |
| if err != nil { | |
| // No state file found - proceed normally | |
| // Look for .devlake-azure.json in both the deploy directory and the current working directory. | |
| candidatePaths := []string{filepath.Join(absDir, ".devlake-azure.json")} | |
| if cwd, err := os.Getwd(); err == nil { | |
| if cwdAbs, err := filepath.Abs(cwd); err == nil && cwdAbs != absDir { | |
| candidatePaths = append(candidatePaths, filepath.Join(cwdAbs, ".devlake-azure.json")) | |
| } | |
| } | |
| var ( | |
| data []byte | |
| err error | |
| ) | |
| for _, stateFile := range candidatePaths { | |
| data, err = os.ReadFile(stateFile) | |
| if err == nil { | |
| break | |
| } | |
| } | |
| if err != nil { | |
| // No state file found in any candidate location - proceed normally |
|
@claude[agent] apply changes based on the comments in this thread |
- Fix Azure restart prompt to not ask for cleanup confirmation when it doesn't perform cleanup - Fix local deployment detection to honor user choice even when only artifacts exist (no state file) - Fix docker-compose-dev.yml cleanup by explicitly using -f flag - Fix misleading compose file name display to show actual file found - Add trailing slash normalization in PingURL to match Discover() behavior - Fix Azure partial state path mismatch by writing to deployAzureDir consistently Agent-Logs-Url: https://github.com/DevExpGbb/gh-devlake/sessions/66b97b76-9583-4ab4-826e-ade3b8a58ac1 Co-authored-by: ewega <26189114+ewega@users.noreply.github.com>
Applied all review feedback in commit 45a8fef:
|
cmd/deploy_azure.go
Outdated
| // Check for state file | ||
| data, err := os.ReadFile(stateFile) | ||
| if err != nil { | ||
| // No state file found - proceed normally |
There was a problem hiding this comment.
When os.ReadFile(stateFile) fails, this treats all errors as “no state file” and proceeds. For non-IsNotExist errors (e.g., permission issues), it would be safer to surface a warning or return an error, otherwise the deploy may continue without recovery context and may later fail again when attempting to write the same file.
| // No state file found - proceed normally | |
| if !os.IsNotExist(err) { | |
| fmt.Printf("\n⚠️ Could not read Azure state file %s: %v\n", stateFile, err) | |
| } | |
| // No state file found or unreadable - proceed without state |
cmd/deploy_local.go
Outdated
| fmt.Println(" 2. Start Docker Desktop and wait for it to fully initialize") | ||
| fmt.Println(" 3. Verify Docker is running: docker ps") | ||
| fmt.Println(" 4. Re-run this command: gh devlake deploy local") | ||
| return fmt.Errorf("Docker is not available — follow recovery steps above") |
There was a problem hiding this comment.
The Docker availability failure path drops the underlying error from dockerpkg.CheckAvailable(). Keeping a %w-wrapped error (while still printing the recovery steps) helps callers/debug logs distinguish “docker not installed” vs “daemon not running” vs other failures.
| return fmt.Errorf("Docker is not available — follow recovery steps above") | |
| return fmt.Errorf("Docker is not available — follow recovery steps above: %w", err) |
cmd/deploy_local.go
Outdated
| fmt.Println(" • MySQL takes longer on first run (database initialization)") | ||
| fmt.Println(" • Insufficient Docker resources (increase memory in Docker Desktop settings)") | ||
| fmt.Println(" • Port conflicts (check docker compose logs for 'address already in use')") | ||
| return "", fmt.Errorf("DevLake not ready — check logs for details") |
There was a problem hiding this comment.
The readiness-timeout return loses the underlying waitForReadyAny error (it previously carried details like the timeout/attempt count). Consider wrapping the original err with %w so the error retains diagnostic context in logs/tests while still printing the user-facing troubleshooting output above.
| return "", fmt.Errorf("DevLake not ready — check logs for details") | |
| return "", fmt.Errorf("DevLake not ready — check logs for details: %w", err) |
| stateFile := filepath.Join(absDir, ".devlake-local.json") | ||
|
|
||
| // Check for state file | ||
| state, err := devlake.LoadState(stateFile) |
There was a problem hiding this comment.
devlake.LoadState can fail due to JSON parse errors or permission issues; currently those cases are treated the same as “no state file” and no warning is shown. It’d be helpful to distinguish os.IsNotExist from other errors and surface a warning (or return an error for unreadable state) so users understand why recovery can’t use the state file.
| state, err := devlake.LoadState(stateFile) | |
| state, err := devlake.LoadState(stateFile) | |
| if err != nil && !os.IsNotExist(err) { | |
| fmt.Printf("\n⚠️ Unable to read local deployment state from %s: %v\n", stateFile, err) | |
| } |
cmd/deploy_local.go
Outdated
| // Remove state file | ||
| stateFile := filepath.Join(absDir, ".devlake-local.json") | ||
| if _, err := os.Stat(stateFile); err == nil { | ||
| os.Remove(stateFile) |
There was a problem hiding this comment.
os.Remove(stateFile) ignores its returned error. If the state file can’t be removed (permissions, read-only FS), the subsequent deploy may still be treated as “existing” on the next run. Consider checking and returning or at least warning on remove failures.
| os.Remove(stateFile) | |
| if err := os.Remove(stateFile); err != nil { | |
| fmt.Printf("\n⚠️ Failed to remove local state file %s: %v\n", stateFile, err) | |
| } |
cmd/deploy_local.go
Outdated
| fmt.Println(" gh devlake deploy local") | ||
| fmt.Println("\n💡 To clean up partial artifacts:") | ||
| fmt.Println(" gh devlake cleanup --local --force") | ||
| return "", fmt.Errorf("port conflict — stop the conflicting container and retry") |
There was a problem hiding this comment.
In the port-conflict branch, the returned error doesn’t wrap the original docker compose up error, which makes debugging harder and diverges from the pattern used elsewhere in this function (preserving the compose output/error). Consider returning a contextual error that includes %w with the original err.
| return "", fmt.Errorf("port conflict — stop the conflicting container and retry") | |
| return "", fmt.Errorf("port conflict — stop the conflicting container and retry: %w", err) |
cmd/deploy_local.go
Outdated
| fmt.Printf(" 1. Check container logs: docker compose -f \"%s/docker-compose.yml\" logs devlake\n", absDir) | ||
| fmt.Println(" 2. Verify all containers are running: docker compose ps") | ||
| fmt.Println(" 3. Check MySQL initialization: docker compose logs mysql") | ||
| fmt.Println(" 4. If containers keep restarting, check: docker compose logs") |
There was a problem hiding this comment.
The troubleshooting command hardcodes docker-compose.yml, but this command supports deployments where the compose file may be docker-compose-dev.yml (and the port-conflict guidance already handles arbitrary compose file paths). Prefer detecting which compose file exists (or tracking which file was used) and printing commands that target that file (or instruct cd into the target dir before running docker compose ...).
| fmt.Printf(" 1. Check container logs: docker compose -f \"%s/docker-compose.yml\" logs devlake\n", absDir) | |
| fmt.Println(" 2. Verify all containers are running: docker compose ps") | |
| fmt.Println(" 3. Check MySQL initialization: docker compose logs mysql") | |
| fmt.Println(" 4. If containers keep restarting, check: docker compose logs") | |
| composeFile := "docker-compose.yml" | |
| if _, err := os.Stat(filepath.Join(absDir, "docker-compose-dev.yml")); err == nil { | |
| composeFile = "docker-compose-dev.yml" | |
| } | |
| composePath := filepath.Join(absDir, composeFile) | |
| fmt.Printf(" 1. Check container logs: docker compose -f \"%s\" logs devlake\n", composePath) | |
| fmt.Printf(" 2. Verify all containers are running: docker compose -f \"%s\" ps\n", composePath) | |
| fmt.Printf(" 3. Check MySQL initialization: docker compose -f \"%s\" logs mysql\n", composePath) | |
| fmt.Printf(" 4. If containers keep restarting, check: docker compose -f \"%s\" logs\n", composePath) |
|
@claude[agent] Adress the comments in this thread #146 (review) |
- Add warning for non-IsNotExist errors when reading Azure state file - Wrap underlying errors with %w for Docker availability, readiness timeout, and port conflicts - Add warning for non-IsNotExist errors when loading local deployment state - Add warning when state file removal fails during cleanup - Detect actual compose file (docker-compose.yml vs docker-compose-dev.yml) in troubleshooting output Agent-Logs-Url: https://github.com/DevExpGbb/gh-devlake/sessions/96938f2c-308a-42bb-a21d-6de79e11ddd1 Co-authored-by: ewega <26189114+ewega@users.noreply.github.com>
Applied all review feedback in commit cddbf92:
|
Original prompt