Skip to content

Harden deploy migration flow during local and Azure deploys#148

Open
Copilot wants to merge 7 commits intomainfrom
copilot/investigate-db-migration-error
Open

Harden deploy migration flow during local and Azure deploys#148
Copilot wants to merge 7 commits intomainfrom
copilot/investigate-db-migration-error

Conversation

Copy link
Contributor

Copilot AI commented Mar 25, 2026

Summary

  • add a shared migration helper that retries the trigger endpoint and always waits for migration completion afterward
  • use the resilient migration flow in both deploy local and deploy azure
  • make TriggerMigration fail on non-2xx responses and include response details when available
  • add focused tests covering timeout, retry, and trigger status handling

Validation

  • go build ./...
  • go test ./...
  • go vet ./...

Copilot AI and others added 7 commits March 25, 2026 20:52
Agent-Logs-Url: https://github.com/DevExpGbb/gh-devlake/sessions/a286b823-c9cc-41e5-9805-7ad02eb0248b

Co-authored-by: ewega <26189114+ewega@users.noreply.github.com>
Copilot AI requested a review from ewega March 25, 2026 21:01
@ewega ewega marked this pull request as ready for review March 25, 2026 22:26
Copilot AI review requested due to automatic review settings March 25, 2026 22:26
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the DevLake database migration step used by the GitHub CLI extension during deploy local and deploy azure, making the trigger phase retryable and ensuring the flow always waits for migration completion.

Changes:

  • Adds a shared triggerAndWaitForMigration helper that retries the migration trigger and then polls /ping until migration completes.
  • Updates local and Azure deploy flows to use the shared resilient migration helper.
  • Improves Client.TriggerMigration() to fail on non-2xx responses and (when present) surface response body details; adds targeted tests.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
internal/devlake/client.go Makes TriggerMigration return errors on non-2xx responses and include response body details.
internal/devlake/client_test.go Adds table-driven tests for TriggerMigration status handling.
cmd/helpers.go Enhances migration waiting messages and introduces the shared trigger+wait helper with retry logic.
cmd/helpers_migration_test.go Adds focused tests for retry/timeout behavior around the shared migration helper.
cmd/deploy_local.go Switches local deploy migration behavior to the new shared helper and improved warnings.
cmd/deploy_azure.go Switches Azure deploy migration behavior to the new shared helper and improved warnings.
Comments suppressed due to low confidence (1)

cmd/helpers.go:278

  • lastErr is never cleared when a later trigger attempt succeeds. If attempt 1 fails and attempt 2 succeeds, lastErr stays non-nil, so the function prints "Continuing to monitor…" incorrectly and may return a misleading combined error if waiting later fails. Track a triggerSucceeded bool (or reset lastErr = nil on success) and only treat trigger as failed if all attempts fail.
	var lastErr error
	for attempt := 1; attempt <= triggerAttempts; attempt++ {
		err := devlakeClient.TriggerMigration()
		if err == nil {
			fmt.Println("   ✅ Migration triggered")
			break
		}
		lastErr = err
		fmt.Printf("   ⚠️  Trigger attempt %d/%d failed: %v\n", attempt, triggerAttempts, err)
		if attempt < triggerAttempts {
			fmt.Println("   DevLake may still be starting or migration may already be running — retrying...")
			time.Sleep(triggerInterval)
		}
	}

	fmt.Println("\n⏳ Waiting for migration to complete...")
	if lastErr != nil {
		fmt.Println("   Continuing to monitor migration status anyway...")
	}
	if err := waitForMigration(baseURL, waitAttempts, waitInterval); err != nil {
		if lastErr != nil {
			return fmt.Errorf("migration trigger failed earlier (%v) and waiting for migration completion also failed: %w", lastErr, err)
		}

Comment on lines +249 to +255
func triggerAndWaitForMigration(baseURL string) error {
return triggerAndWaitForMigrationWithClient(baseURL, devlake.NewClient(baseURL), 3, 10*time.Second, 60, 5*time.Second)
}

func triggerAndWaitForMigrationWithClient(baseURL string, devlakeClient *devlake.Client, triggerAttempts int, triggerInterval time.Duration, waitAttempts int, waitInterval time.Duration) error {
fmt.Println("\n🔄 Triggering database migration...")

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

triggerAndWaitForMigrationWithClient takes both baseURL and a devlakeClient that already carries a BaseURL. If these ever diverge, migration trigger and migration wait will hit different instances. Consider deriving the wait URL from devlakeClient.BaseURL (or validating they match) to avoid accidental mismatches.

This issue also appears on line 256 of the same file.

Copilot uses AI. Check for mistakes.
Comment on lines 507 to +521
// TriggerMigration triggers the DevLake database migration endpoint.
func (c *Client) TriggerMigration() error {
resp, err := c.HTTPClient.Get(c.BaseURL + "/proceed-db-migration")
if err != nil {
return err
}
resp.Body.Close()
return nil
}
if err != nil {
return fmt.Errorf("triggering migration: %w", err)
}
defer resp.Body.Close()
if resp.StatusCode < http.StatusOK || resp.StatusCode >= http.StatusMultipleChoices {
body, _ := io.ReadAll(io.LimitReader(resp.Body, 512))
bodyText := strings.TrimSpace(string(body))
if bodyText != "" {
return fmt.Errorf("DevLake returned status %d: %s", resp.StatusCode, bodyText)
}
return fmt.Errorf("DevLake returned status %d", resp.StatusCode)
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TriggerMigration error messages drop the request context (endpoint/path), which makes logs harder to interpret compared to other client helpers that include the HTTP method and path. Consider including /proceed-db-migration (and ideally resp.Status) in the returned error so callers can quickly identify the failing call.

See below for a potential fix:

	path := "/proceed-db-migration"
	resp, err := c.HTTPClient.Get(c.BaseURL + path)
	if err != nil {
		return fmt.Errorf("GET %s: triggering migration: %w", path, err)
	}
	defer resp.Body.Close()
	if resp.StatusCode < http.StatusOK || resp.StatusCode >= http.StatusMultipleChoices {
		body, _ := io.ReadAll(io.LimitReader(resp.Body, 512))
		bodyText := strings.TrimSpace(string(body))
		if bodyText != "" {
			return fmt.Errorf("GET %s: DevLake returned %s: %s", path, resp.Status, bodyText)
		}
		return fmt.Errorf("GET %s: DevLake returned %s", path, resp.Status)

Copilot uses AI. Check for mistakes.
Comment on lines +54 to +88
func TestTriggerAndWaitForMigrationWithClient_RetriesBeforeWaiting(t *testing.T) {
triggerCalls := 0
pingCalls := 0

srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
switch r.URL.Path {
case "/proceed-db-migration":
triggerCalls++
if triggerCalls == 1 {
w.WriteHeader(http.StatusServiceUnavailable)
return
}
w.WriteHeader(http.StatusOK)
case "/ping":
pingCalls++
w.WriteHeader(http.StatusOK)
default:
http.NotFound(w, r)
}
}))
defer srv.Close()

client := devlake.NewClient(srv.URL)

err := triggerAndWaitForMigrationWithClient(srv.URL, client, 2, time.Millisecond, 2, time.Millisecond)
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if triggerCalls != 2 {
t.Fatalf("trigger calls = %d, want 2", triggerCalls)
}
if pingCalls != 1 {
t.Fatalf("ping calls = %d, want 1", pingCalls)
}
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new migration helper has an edge case where an early trigger failure followed by a later success should not be treated as a trigger failure (and should not produce the combined "trigger failed earlier" error). Adding a focused test for "first trigger fails, later succeeds, then wait fails" would lock this behavior in and prevent regressions.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants