Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Minimum free storage space" / Recovering server from full disk #2350

Open
Sebbl22 opened this issue Jan 13, 2025 · 2 comments
Open

"Minimum free storage space" / Recovering server from full disk #2350

Sebbl22 opened this issue Jan 13, 2025 · 2 comments
Labels

Comments

@Sebbl22
Copy link

Sebbl22 commented Jan 13, 2025

Describe the bug
Our "Minimum free storage space" setting is 524288000 bytes, which should be 500 MB. This setting seems mostly to be honored, but server recovery from a full disk is still not possible.
grafik
grafik
grafik

Error when retention processing:

{"@t":"2025-01-13T00:45:57.2087666Z","@sp":"a16c2d27a8a88597","@tr":"cd6f9ff0dd79b8a837b1cd536f97f6a5","@mt":"Failed to apply retention policy {RetentionPolicyId}","@l":"Error","@x":"Flare.Ffi.Result.FlareException: Flare native storage failed (IOError), error attempting to write an event.\n caused by: error writing an event\n caused by: I/O error at path: "D:\\Seq\\Stream\\stream.795a5425104b42ab870099faec49152c.tmp"\n caused by: There is not enough space on the disk. (os error 112)\r\n at Seq.Engine.Storage.StorageEngine.QueryInternal(Cursor cursor, NativeCancellationToken nativeCancel, Object[] sharedColumnBuffer, CancellationToken cancel)+MoveNext()\r\n at Seq.Engine.Events.EventStore.QueryInternal(IEnumerable1 rowset, Boolean disableReadRateLimit)+MoveNext()\r\n at System.Linq.Enumerable.TryGetSingle[TSource](IEnumerable1 source, Boolean& found)\r\n at Seq.Engine.Queries.DataStore.ReclaimStorage(DateTimeRangeBounds bounds, String filter, IndexExpression indexExpression, CancellationToken cancel)\r\n at Seq.Server.Features.Retention.RetentionProcessor.ApplyPolicy(RetentionPolicy retentionPolicy, RetentionPolicyBookmark bookmark, IReadOnlyDictionary`2 associatedSignals, CancellationToken cancel)\r\n at Seq.Server.Features.Retention.RetentionProcessor.Apply(CancellationToken cancel)","RetentionPolicyId":"retentionpolicy-11362","SourceContext":"Seq.Server.Features.Retention.RetentionProcessor"}

Error when manually deleting events:

httpContext)","RequestMethod":"DELETE","RequestPath":"/api/events/signal","StatusCode":500,"Elapsed":3.3613,"ErrorToken":"0374f4d3f4354827b3c4eb0144e43d63","RequestProtocol":"HTTP/2","RequestHost":"salog.netto.lan","RequestHeaders":{"Content-Length":"225","Content-Type":"text/plain;charset=UTF-8","User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:134.0) Gecko/20100101 Firefox/134.0"},"SourceContext":"Seq.Server.Web.Middleware.RequestCompletionMiddleware","RequestId":"800003fd-0000-1f00-b63f-84710c7967bb"}
{"@t":"2025-01-13T06:50:11.8342259Z","@sp":"e0dc8a8731ca29df","@tr":"b162b1e42777a2bd168bda35c4732733","@mt":"Alert {AlertId} could not be processed; no error notification was sent","@l":"Error","@x":"Flare.Ffi.Result.FlareException: Flare native storage failed (IOError), failed to write a document collection.\n caused by: error attempting to commit a document collection\n caused by: failed to write a checkpoint\n caused by: flush error\n caused by: I/O error at path: "D:\\Seq\\Documents\\metastore.collection.7ed870bec6904e5f96033b2bdc3b0152.docc"\n caused by: There is not enough space on the disk. (os error 112)\r\n at Flare.Store.BeginWrite(String source)\r\n at Quince.Storage.Native.NativeDocumentTransaction.Begin(Store store, ISharedDocumentStoreCache sharedCache, IEndTransactionListener endTransactionListener)\r\n at Quince.Storage.Native.NativeDocumentSession.Write[TResult](Func3 withWriter)\r\n at Seq.Server.Features.Alerting.AlertProcessor.PersistState(String alertId, DateTime checkTime, Nullable1 suppressedUntil, AlertOccurrence occurrence, String[] failures)\r\n at Seq.Server.Features.Alerting.AlertProcessor.CheckAlert(AlertProcessorEntry entry, CancellationToken cancel)","AlertId":"alert-5537","SourceContext":"Seq.Server.Features.Alerting.AlertProcessor"}
{"@t":"2025-01-13T06:50:11.8344294Z","@sp":"e0dc8a8731ca29df","@tr":"b162b1e42777a2bd168bda35c4732733","@mt":"Check alert {AlertId}","@l":"Error","@x":"Flare.Ffi.Result.FlareException: Flare native storage failed (IOError), failed to write a document collection.\n caused by: error attempting to commit a document collection\n caused by: failed to write a checkpoint\n caused by: flush error\n caused by: I/O error at path: "D:\\Seq\\Documents\\metastore.collection.7ed870bec6904e5f96033b2bdc3b0152.docc"\n caused by: There is not enough space on the disk. (os error 112)\r\n at Flare.Store.BeginWrite(String source)\r\n at Quince.Storage.Native.NativeDocumentTransaction.Begin(Store store, ISharedDocumentStoreCache sharedCache, IEndTransactionListener endTransactionListener)\r\n at Quince.Storage.Native.NativeDocumentSession.Write[TResult](Func3 withWriter)\r\n at Seq.Server.Features.Alerting.AlertProcessor.PersistState(String alertId, DateTime checkTime, Nullable1 suppressedUntil, AlertOccurrence occurrence, String[] failures)\r\n at Seq.Server.Features.Alerting.AlertProcessor.CheckAlert(AlertProcessorEntry entry, CancellationToken cancel)","AlertId":"alert-5537","@st":"2025-01-13T06:50:11.8120666Z","SourceContext":"Seq.Server.Features.Alerting.AlertProcessor"}
{"@t":"2025-01-13T06:50:11.8495110Z","@mt":"Unhandled exception raised while checking alerts","@l":"Error","@x":"Flare.Ffi.Result.FlareException: Flare native storage failed (IOError), failed to write a document collection.\n caused by: error attempting to commit a document collection\n caused by: failed to write a checkpoint\n caused by: flush error\n caused by: I/O error at path: "D:\\Seq\\Documents\\metastore.collection.b3b665943eb94a00b942751e1591eb2c.docc"\n caused by: There is not enough space on the disk. (os error 112)\r\n at Flare.Store.BeginWrite(String source)\r\n at Quince.Storage.Native.NativeDocumentTransaction.Begin(Store store, ISharedDocumentStoreCache sharedCache, IEndTransactionListener endTransactionListener)\r\n at Quince.Storage.Native.NativeDocumentSession.Write[TResult](Func3 withWriter)\r\n at Seq.Server.Features.Alerting.AlertProcessor.PersistState(String alertId, DateTime checkTime, Nullable1 suppressedUntil, AlertOccurrence occurrence, String[] failures)\r\n at Seq.Server.Features.Alerting.AlertProcessor.CheckAlert(AlertProcessorEntry entry, CancellationToken cancel)\r\n at Seq.Server.Features.Alerting.AlertProcessor.CheckAlerts(CancellationToken cancel)\r\n at Seq.Server.Features.Alerting.AlertProcessor.Run(CancellationToken cancel)"}
{"@t":"2025-01-13T06:50:53.3379720Z","@sp":"aee97e747e36d3e4","@tr":"9db8cfffd601508327bed9d875f593b6","@mt":"Check alert {AlertId}","AlertId":"alert-4641","@st":"2025-01-13T06:50:53.2926918Z","SourceContext":"Seq.Server.Features.Alerting.AlertProcessor"}
{"@t":"2025-01-13T06:51:00.5592205Z","@sp":"0541e6a66c85a64c","@tr":"d29ede6065c83e7bd54cf1c4f085e56e","@mt":"Potentially performance-degrading imperative event deletion requested","@l":"Warning","Principal":{"Id":"user-admin","OnBehalfOfUserId":"user-admin"},"ActionId":"d9935125-9bd2-4051-a912-00e627e45b93","ActionName":"Seq.Server.Web.Api.EventsController.DeleteInSignal (Seq)","RequestId":"80050504-0000-8800-b63f-84710c7967bb","RequestPath":"/api/events/signal"}
{"@t":"2025-01-13T06:51:00.5612726Z","@sp":"0541e6a66c85a64c","@tr":"d29ede6065c83e7bd54cf1c4f085e56e","@mt":"HTTP {RequestMethod} {RequestPath} responded {StatusCode} in {Elapsed:0.0000} ms","@r":["3.5629"],"@l":"Error","@x":"Flare.Ffi.Result.FlareException: Flare native storage failed (InternalError), error beginning a transaction.\n caused by: an earlier failure may have corrupted internal state; the store will need to be reopened to continue\r\n at Seq.Engine.Storage.StorageEngine.Query(Span1 query, Object[] sharedColumnBuffer, CancellationToken cancel)\r\n at Seq.Engine.Events.EventStore.Query(Span1 query, Object[] sharedColumnBuffer, Boolean disableReadRateLimit, CancellationToken cancel)\r\n at Seq.Engine.Events.IEventStore.Query(String query, Object[] sharedColumnBuffer, Boolean disableReadRateLimit, CancellationToken cancel)\r\n at Seq.Engine.Events.EventStore.Execute(String query, CancellationToken cancel)\r\n at Seq.Engine.Queries.DataStore.Delete(DateTimeRangeBounds bounds, String filter, IndexExpression indexExpr, CancellationToken cancel)\r\n at Seq.Server.Web.Api.EventsController.<>c__DisplayClass18_0.b__1(CancellationToken ct)\r\n at Seq.Engine.Workers.WorkerPool.<>c__DisplayClass6_0.b__0(CancellationToken ct)\r\n at Seq.Engine.Workers.WorkerPool.<>c__DisplayClass7_01.<Run>g__DoWork|0()\r\n at System.Threading.Tasks.Task1.InnerInvoke()\r\n at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)\r\n--- End of stack trace from previous location ---\r\n at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)\r\n at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)\r\n--- End of stack trace from previous location ---\r\n at Seq.Server.Web.Api.EventsController.DeleteInSignal(EvaluationContext evaluationContext)\r\n at Seq.Server.Web.Api.EventsController.DeleteInSignal()\r\n at Microsoft.AspNetCore.Mvc.Infrastructure.ActionMethodExecutor.TaskOfIActionResultExecutor.Execute(ActionContext actionContext, IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments)\r\n at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.g__Awaited|12_0(ControllerActionInvoker invoker, ValueTask`1 actionResultValueTask)\r\n at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.g__Awaited|10_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)\r\n at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Rethrow(ActionExecutedContextSealed context)\r\n at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)\r\n at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.g__Awaited|13_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)\r\n at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.g__Awaited|25_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)\r\n at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.Rethrow(ResourceExecutedContextSealed context)\r\n at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)\r\n at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.g__Awaited|20_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)\r\n at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.g__Awaited|17_0(ResourceInvoker invoker, Task task, IDisposable scope)\r\n at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.g__Awaited|17_0(ResourceInvoker invoker, Task task, IDisposable scope)\r\n at Microsoft.AspNetCore.RateLimiting.RateLimitingMiddleware.InvokeInternal(HttpContext context, EnableRateLimitingAttribute enableRateLimitingAttribute)\r\n at Seq.Server.Web.Middleware.WebSocketAcceptMiddleware.Invoke(HttpContext context)\r\n at Seq.Server.Web.Middleware.RequestAuthenticationMiddleware.Invoke(HttpContext httpContext)\r\n at Microsoft.AspNetCore.Authentication.AuthenticationMiddleware.Invoke(HttpContext context)\r\n at Seq.Server.Web.Middleware.HstsMiddleware.Invoke(HttpContext context)\r\n at Seq.Server.Web.Middleware.ServerStatusMiddleware.Invoke(HttpContext context)\r\n at Seq.Server.Web.Middleware.UniversalHeadersMiddleware.Invoke(HttpContext context)\r\n at Seq.Server.Web.Middleware.RequestCompletionMiddleware.Invoke(HttpContext httpContext)","RequestMethod":"DELETE","RequestPath":"/api/events/signal","StatusCode":500,"Elapsed":3.5629,"ErrorToken":"da25a13a261d4d0382553e88b87dd3c8","RequestProtocol":"HTTP/2","RequestHost":"salog.netto.lan","RequestHeaders":{"Content-Length":"225","Content-Type":"text/plain;charset=UTF-8","User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:134.0) Gecko/20100101 Firefox/134.0"},"SourceContext":"Seq.Server.Web.Middleware.RequestCompletionMiddleware","RequestId":"80050504-0000-8800-b63f-84710c7967bb"}
Sometimes (see screenshot above), the free disc space falls below the configured "minimum free storage space". Around 7:40 I started troubleshooting, but there are visible dips below 500 MB even before.

To Reproduce
Steps to reproduce the behavior:

  1. Configure "Minimum free storage space"
  2. Ingest events until disc is full (Seq "Disc full" error appears)
  3. Try to alter the retention policies or manually delete events
  4. Either "An unhandled error occurred" is displayed or "Deleting events failed"
    grafik

Expected behavior
The "minimum free storage space" option should allow an administrator to safely recover Seq from a full disk, either by manually deleting events or by applying a tighter retention policy.
The free disk storage should never fall below the "minimum free storage space" option under normal operation.

If 500 MB for "minimum free storage" is too low, additional guidance or a meaningful default value may be added to Seq / the documentation.

Environment (please complete the following information):

  • OS: Windows Server 2019
  • Browser Firefox
  • Seq Version 2024.3.13181
  • Using Docker? No
    Please note, that all Seq data is on disk D:. No other application is installed on this server and writes to disk D:.

Additional context
We could only recover by adding more storage to the (virtual) disc. Only Seq Data resides on drive D:. We didn't know if any files were safe to remove temporarily.

@Sebbl22 Sebbl22 added the bug label Jan 13, 2025
@liammclennan
Copy link
Contributor

Hi @Sebbl22

The limit means that Seq will stop ingesting at 500MB remaining, but disk space can still be consumed by indexing, retention policies and a number of other processes. In your case 500MB appears to be too low. I suggest 2GB is probably a good limit, more for a high volume server.

@nblumhardt
Copy link
Member

Just a note, 2025.1 will increase the default limit to 4 GB. We're still looking into additional ways to prevent/mitigate this (e.g. stopping indexing when low on disk space), we'll loop back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants