# Production Readiness

Launch checklist and operational best practices for Kadindexer.

## Pre-Launch Checklist

### Infrastructure

- [ ] Rate limiting configured with exponential backoff
- [ ] Caching layer deployed (permanent, balance, recent)
- [ ] HTTP connection pooling enabled
- [ ] API endpoint configured: `https://graph.kadindexer.io`
- [ ] Environment variables secured (no keys in code)


### Query Optimization

- [ ] All queries use pagination (`first: 50`)
- [ ] Server-side filters applied (chainId, accountName, minHeight)
- [ ] Query complexity under tier limits
- [ ] Query variables used for all user input
- [ ] Only necessary fields selected


### Error Handling

- [ ] 429 rate limit retry logic implemented
- [ ] Network error handling with timeouts
- [ ] GraphQL error parsing configured
- [ ] User-friendly error messages


### Monitoring

- [ ] Request metrics tracked (rate, latency, errors)
- [ ] Rate limit quota monitoring
- [ ] Cache hit rate tracking
- [ ] Performance dashboard live
- [ ] Alerts configured for critical thresholds


### Security

- [ ] No API keys in version control
- [ ] HTTPS enforced
- [ ] Input validation on user queries
- [ ] Query complexity limits respected


## Error Handling

Handle errors gracefully to maintain user experience during failures.

### Common Error Types

| Status | Type | Meaning | Action |
|  --- | --- | --- | --- |
| **400** | Bad Request | Invalid query syntax | Log error, fix query |
| **429** | Rate Limited | Burst limit exceeded | Exponential backoff + retry |
| **500** | Server Error | Kadindexer issue | Retry up to 3x |
| **503** | Unavailable | Temporary outage | Retry with backoff |
| **Network** | Timeout/DNS | Connection failure | Retry with timeout |


### Implementation Pattern


```javascript
async function robustQuery(query, variables, options = {}) {
  const { maxRetries = 3, timeout = 30000 } = options;
  
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const controller = new AbortController();
      const timeoutId = setTimeout(() => controller.abort(), timeout);
      
      const response = await fetch('https://graph.kadindexer.io', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ query, variables }),
        signal: controller.signal
      });
      
      clearTimeout(timeoutId);
      
      if (!response.ok) {
        const status = response.status;
        
        // Rate limited - exponential backoff
        if (status === 429) {
          const delay = Math.min(1000 * Math.pow(2, attempt), 60000);
          console.warn(`Rate limited, retrying in ${delay}ms`);
          await new Promise(resolve => setTimeout(resolve, delay));
          continue;
        }
        
        // Server error - retry
        if (status >= 500) {
          const delay = 1000 * (attempt + 1);
          console.warn(`Server error ${status}, retrying in ${delay}ms`);
          await new Promise(resolve => setTimeout(resolve, delay));
          continue;
        }
        
        // Client error - don't retry
        if (status >= 400 && status < 500) {
          const error = await response.json();
          throw new Error(`Query failed: ${error.message || status}`);
        }
      }
      
      return await response.json();
      
    } catch (error) {
      // Network/timeout error
      if (error.name === 'AbortError' || error.code === 'ECONNREFUSED') {
        if (attempt < maxRetries - 1) {
          const delay = 1000 * (attempt + 1);
          console.warn(`Network error, retrying in ${delay}ms`);
          await new Promise(resolve => setTimeout(resolve, delay));
          continue;
        }
      }
      
      // Final attempt failed
      if (attempt === maxRetries - 1) {
        throw new Error(`Query failed after ${maxRetries} attempts: ${error.message}`);
      }
      
      throw error;
    }
  }
}
```

### GraphQL Error Handling

GraphQL can return partial data with errors:


```javascript
const result = await client.request(query, variables);

// Check for GraphQL errors
if (result.errors) {
  result.errors.forEach(error => {
    console.error('GraphQL Error:', {
      message: error.message,
      path: error.path,
      extensions: error.extensions
    });
  });
  
  // Decide whether to use partial data or throw
  if (!result.data) {
    throw new Error('Query returned no data');
  }
}

// Use data if available
return result.data;
```

### User-Facing Error Messages

Translate technical errors into user-friendly messages:


```javascript
function getUserMessage(error) {
  if (error.response?.status === 429) {
    return 'Too many requests. Please wait a moment and try again.';
  }
  
  if (error.response?.status >= 500) {
    return 'Service temporarily unavailable. Please try again shortly.';
  }
  
  if (error.message?.includes('complexity')) {
    return 'Query too complex. Please reduce the amount of data requested.';
  }
  
  if (error.name === 'AbortError') {
    return 'Request timed out. Please check your connection.';
  }
  
  return 'An error occurred. Please try again.';
}
```

## High Availability Strategies

### Graceful Degradation

Provide cached or limited data when primary queries fail:


```javascript
async function getAccountBalance(accountName, chainId) {
  try {
    // Try live query
    const result = await client.request(balanceQuery, { accountName, chainId });
    balanceCache.set(cacheKey, result);
    return result;
  } catch (error) {
    // Fall back to cached data
    const cached = balanceCache.get(cacheKey);
    if (cached) {
      console.warn('Using cached balance due to error:', error.message);
      return { ...cached, stale: true };
    }
    throw error;
  }
}
```

### Circuit Breaker Pattern

Prevent cascading failures by temporarily stopping requests:


```javascript
class CircuitBreaker {
  constructor(threshold = 5, timeout = 60000) {
    this.failureCount = 0;
    this.threshold = threshold;
    this.timeout = timeout;
    this.state = 'CLOSED'; // CLOSED, OPEN, HALF_OPEN
    this.nextAttempt = Date.now();
  }
  
  async execute(fn) {
    if (this.state === 'OPEN') {
      if (Date.now() < this.nextAttempt) {
        throw new Error('Circuit breaker is OPEN');
      }
      this.state = 'HALF_OPEN';
    }
    
    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }
  
  onSuccess() {
    this.failureCount = 0;
    this.state = 'CLOSED';
  }
  
  onFailure() {
    this.failureCount++;
    if (this.failureCount >= this.threshold) {
      this.state = 'OPEN';
      this.nextAttempt = Date.now() + this.timeout;
      console.error('Circuit breaker opened');
    }
  }
}

// Usage
const breaker = new CircuitBreaker();

async function queryWithBreaker(query, variables) {
  return breaker.execute(() => client.request(query, variables));
}
```

## Deployment Strategy

### Environment Configuration

Use different configurations per environment:


```javascript
const config = {
  development: {
    endpoint: 'https://graph.kadindexer.io',
    timeout: 10000,
    retries: 1,
    logLevel: 'debug'
  },
  staging: {
    endpoint: 'https://graph.kadindexer.io',
    timeout: 30000,
    retries: 3,
    logLevel: 'info'
  },
  production: {
    endpoint: 'https://graph.kadindexer.io',
    timeout: 30000,
    retries: 3,
    logLevel: 'warn'
  }
};

const env = process.env.NODE_ENV || 'development';
export default config[env];
```

### Health Checks

Implement health checks to verify Kadindexer connectivity:


```javascript
async function healthCheck() {
  try {
    const result = await client.request(
      `query { graphConfiguration { version } }`,
      {},
      { timeout: 5000 }
    );
    
    return {
      status: 'healthy',
      version: result.graphConfiguration.version,
      timestamp: new Date().toISOString()
    };
  } catch (error) {
    return {
      status: 'unhealthy',
      error: error.message,
      timestamp: new Date().toISOString()
    };
  }
}

// Run health check every 60 seconds
setInterval(async () => {
  const health = await healthCheck();
  if (health.status === 'unhealthy') {
    console.error('Kadindexer health check failed:', health.error);
  }
}, 60000);
```

### Gradual Rollout

Deploy to a subset of users first:


```javascript
function shouldUseKadindexer(userId) {
  // Gradual rollout based on user ID
  const rolloutPercentage = 10; // Start with 10%
  const hash = userId.split('').reduce((a, b) => {
    a = ((a << 5) - a) + b.charCodeAt(0);
    return a & a;
  }, 0);
  
  return Math.abs(hash % 100) < rolloutPercentage;
}
```

## Monitoring & Observability

### Key Metrics Dashboard

Track these metrics in your monitoring system:


```javascript
const metrics = {
  // Request metrics
  totalRequests: 0,
  successfulRequests: 0,
  failedRequests: 0,
  
  // Performance metrics
  avgResponseTime: 0,
  p95ResponseTime: 0,
  p99ResponseTime: 0,
  
  // Resource metrics
  cacheHitRate: 0,
  rateLimitUsage: 0,
  
  // Error metrics
  errorsByType: {},
  
  // Business metrics
  activeUsers: 0,
  queriesPerUser: 0
};

function updateMetrics(result, duration, error) {
  metrics.totalRequests++;
  
  if (error) {
    metrics.failedRequests++;
    metrics.errorsByType[error.type] = (metrics.errorsByType[error.type] || 0) + 1;
  } else {
    metrics.successfulRequests++;
  }
  
  // Update response time (moving average)
  metrics.avgResponseTime = (metrics.avgResponseTime * 0.95) + (duration * 0.05);
}
```

### Logging Best Practices

Log relevant information without exposing sensitive data:


```javascript
function logQuery(query, variables, result, duration) {
  const logData = {
    timestamp: new Date().toISOString(),
    queryName: query.definitions?.[0]?.name?.value || 'anonymous',
    duration,
    success: !result.errors,
    // Sanitize variables - remove sensitive data
    variables: sanitizeVariables(variables)
  };
  
  if (result.errors) {
    logData.errors = result.errors.map(e => ({
      message: e.message,
      path: e.path
    }));
  }
  
  if (duration > 1000) {
    console.warn('Slow query:', logData);
  } else {
    console.info('Query:', logData);
  }
}

function sanitizeVariables(variables) {
  // Remove or redact sensitive fields
  const sanitized = { ...variables };
  if (sanitized.privateKey) delete sanitized.privateKey;
  if (sanitized.signature) sanitized.signature = '***';
  return sanitized;
}
```

## Incident Response

### Detection & Alerting

Set up alerts for critical issues:


```javascript
function checkAlerts() {
  // High error rate
  const errorRate = metrics.failedRequests / metrics.totalRequests;
  if (errorRate > 0.05) {
    sendAlert('HIGH_ERROR_RATE', `Error rate: ${(errorRate * 100).toFixed(2)}%`);
  }
  
  // Approaching rate limit
  if (metrics.rateLimitUsage > 0.9) {
    sendAlert('RATE_LIMIT_WARNING', `Using ${(metrics.rateLimitUsage * 100).toFixed(0)}% of rate limit`);
  }
  
  // Slow queries
  if (metrics.p95ResponseTime > 2000) {
    sendAlert('SLOW_QUERIES', `P95 response time: ${metrics.p95ResponseTime}ms`);
  }
  
  // Low cache hit rate
  if (metrics.cacheHitRate < 0.5) {
    sendAlert('LOW_CACHE_HIT_RATE', `Cache hit rate: ${(metrics.cacheHitRate * 100).toFixed(0)}%`);
  }
}

setInterval(checkAlerts, 60000); // Check every minute
```

### Response Procedures

**When errors spike:**

1. Check [Kadindexer status](https://kadindexer.io) (if status page exists)
2. Review recent deployments (rollback if needed)
3. Examine error logs for patterns
4. Verify rate limits haven't been exceeded
5. Contact support if Kadindexer issue: [toni@hackachain.io](mailto:toni@hackachain.io)


**When performance degrades:**

1. Check query complexity and pagination sizes
2. Verify caching is working correctly
3. Review recent query changes
4. Check network connectivity
5. Consider tier upgrade if sustained high load


## Support Channels

### By Tier

**Basic (Free):**

- Email: [toni@hackachain.io](mailto:toni@hackachain.io)
- Community support
- Response time: Best effort


**Developer:**

- Email: [toni@hackachain.io](mailto:toni@hackachain.io)
- Priority support
- Response time: 48 hours


**Team:**

- Dedicated support team
- Email: [toni@hackachain.io](mailto:toni@hackachain.io)
- Priority escalation
- Response time: 24 hours


### When to Contact Support

- Persistent 500/503 errors
- Unexpected rate limiting
- Query complexity issues
- Data inconsistencies
- Feature requests
- Tier upgrades


## Final Checklist

Before going live:

**Infrastructure:**

- [ ] Rate limiting with exponential backoff
- [ ] Caching with appropriate TTLs
- [ ] Connection pooling enabled
- [ ] Circuit breaker implemented
- [ ] Health checks running


**Monitoring:**

- [ ] Metrics collection active
- [ ] Logging configured
- [ ] Alerts set up
- [ ] Dashboard created
- [ ] On-call rotation defined


**Error Handling:**

- [ ] Retry logic for 429/500/503
- [ ] No retry for 400 errors
- [ ] User-friendly error messages
- [ ] Graceful degradation strategy


**Security:**

- [ ] API keys in environment variables
- [ ] HTTPS enforced
- [ ] Input validation implemented
- [ ] Sensitive data not logged


**Performance:**

- [ ] Queries optimized
- [ ] Query complexity under limits
- [ ] Appropriate tier selected
- [ ] Load tested


## Resources

- [Query Optimization →](/guides/advanced/query-optimization)
- [Performance & Scaling →](/guides/advanced/performance-scaling)
- [GraphQL API Reference →](https://docs.kadindexer.io/apis)


**Need help?** [toni@hackachain.io](mailto:toni@hackachain.io)