Just as a random data point, I have several (I think valid) use cases for bulk imports.

RichiH on 17 Nov 2015

+1

grypyrg on 19 Jan 2016

This would be very useful for me as well. I understand that this could be used to blur the line between a true time-series event store and prometheus' communicated focus as a representation of recent monitoring state.

Its beneficial In the case where prometheus is the ingestion point for a fan out system involving influxdb or federated rollup prometheus nodes - this would allow me to just simply keep pumping all data through the prometheus entry points without having to have two input paths in the case where the data feed is delayed.

foic on 14 Feb 2016

👍2

@foic I can't think of a sane Prometheus setup where that'd be useful. If you want to pump data to influxdb, it'd be best to do it without involving Prometheus as Prometheus is adding no value in such a setup. Similarly rollup depends on continuous pulling, and delayed data can't work with that.

I'd generally adivse to run duplicates, and live with the odd gap. It's not worth the effort to try and get it perfect.

brian-brazil on 15 Feb 2016

👎23

Thanks @brian-brazil - this is pretty much an expected response :-) Sounds like there is
too much to change to make all the pieces work with historical data. Alert
manager, rollup etc etc.

Should this feature request be closed then if working with historical data
is too difficult?

On 15 February 2016 at 10:02, Brian Brazil [email protected] wrote:

@foic https://github.com/foic I can't think of a sane Prometheus setup
where that'd be useful. If you want to pump data to influxdb, it'd be best
to do it without involving Prometheus as Prometheus is adding no value in
such a setup. Similarly rollup depends on continuous pulling, and delayed
data can't work with that.

I'd generally adivse to run duplicates, and live with the odd gap. It's
not worth the effort to try and get it perfect.

—
Reply to this email directly or view it on GitHub
https://github.com/prometheus/prometheus/issues/535#issuecomment-183998667
.

foic on 15 Feb 2016

@foic What you're requesting is different to what this feature request is mainly about.

brian-brazil on 15 Feb 2016

There's some value to bulk import, even in a world where 'storage' isn't the intended purpose of Prometheus. For example...

Recently I've been working on a Prometheus configuration for a certain forum. Although some of the metrics are from PHP, most of the really useful ones are being exported by an nginx logtailer I wrote.

In order to quickly iterate on possible metrics, prior to putting it in production--that hasn't happened yet--I added code to the logtailer that lets it read logs with a time-offset, pausing between each record until it's "supposed" to happen. That's okay-ish, but it'd be much nicer if I could bulk import an entire day's worth of logs at once without actually waiting a day. Then I could look at the result, clear the DB, and try again.

There's the timestamp hack, but none of the client libraries support timestamps, and it's ugly anyway. I haven't tried to use it.

Baughn on 15 Feb 2016

👍2

@Baughn what do you mean by "timestamp hack"?

I have use for a bulk import endpoint as well, and that's for back-filling data that was interrupted/unavailable on the normal time flow.

Overall, I feel it might be somewhat on the border of what is the intended model of prometheus, but there will always be people with the need to diverge from the ideal setup or situation.

jinxcat on 15 Feb 2016

👍3

that's for back-filling data that was interrupted/unavailable on the normal time flow.

That's also not what this issue is about. This issue covers brand new data, with nothing newer than it in the database. It's also not backfilling data, which is when there's nothing older than it in the database.

We've never even discussed this variant.

brian-brazil on 15 Feb 2016

@Baughn what do you mean by "timestamp hack"?

The /metrics format allows specifying a timestamp in addition to the values. None of the clients support this, and Prometheus doesn't support adding values that are any older than than the newest one.

There's a list of caveats as long as your arm, starting with the impossibility of reliably doing this with multiple tasks exporting metrics, but in theory it should be possible to use timestamps to simulate fast-forwarding through historical data, which would cover my specific scenario.

I've never tried it, though.

Baughn on 15 Feb 2016

Per post-Promcon discussions, the consensus was to have a API that can take in a time series at a time.

brian-brazil on 26 Oct 2016

😄1

@brian-brazil thanks!

Currently it is need ask user to run systat every minute, after that ask him to dump sar results to file and send it. after that analyze results manually or via kSar tool.
if Prometheus realize importing it will be very-very useful!

delgod on 10 Feb 2017

That's not something we will support. When we said bulk we mean bulk.

I'd recommend you look at the node exporter for your needs, it'll produce better stats than sar.

brian-brazil on 10 Feb 2017

+1

svetasmirnova on 12 Feb 2017

+1

ssouris on 24 Feb 2017

This would be excellent for my use case. I assumed it was already possible by adding a custom time stamp as outlined in the 'Exposition Formats' page but I've since realized it doesn't work as expected. I've had to move away from Prometheus for my current project because of this but would be very interested in returning to use it in the future if this feature was implemented.

begakens on 27 Feb 2017

👍2

+for loading data based on server logs

radiophysicist on 3 Mar 2017

👍4

For logs look at mtail or the grok exporter. This is not suitable for logs.

brian-brazil on 3 Mar 2017

I tried grok and gave it up due to its impossible to use actual timestamps from log data

radiophysicist on 4 Mar 2017

+1, this would make Prometheus usable for more than just real-time server metrics/alerting. For instance, metrics from sensor networks might come in delayed, due to network availability. Via push. Also, there already is historical data that is valuable to import.

jstsch on 14 Mar 2017

👍6

I hacked up a tool which can do something like this as a proof of concept - you can pre-initialize a Prometheus data store by streaming timestamped text-exposition format metrics into it: https://github.com/wrouesnel/prometheus-prefiller

It basically just launches a prometheus storage engine as a library to do it.

EDIT: Taken to an end state, you'd imagine some sort of /api/v1/export endpoint which simply iterates from the dawn of time at a background priority until it syncs up to the ingressing metrics, and a "bootstrap" mode in Prometheus which takes a URL and calls that endpoint to prefill itself before "launching".

wrouesnel on 21 Mar 2017

👍5

I am looking for this feature to load synthetic test data (a.k.a. random garbage) for evaluation, prototyping, and (hopefully soon) development. I'll try the "prefiller" tool, it looks like it does what I need.

szocske42 on 19 Apr 2017

Per post-Promcon discussions, the consensus was to have a API that can take in a time series at a time.

With the new storage in Prometheus 2.0, this would not be the approach to take. I presume we'll do something more block based.

brian-brazil on 21 Aug 2017

I created a little tool https://github.com/Cleafy/promqueen slightly based on @wrouesnel https://github.com/wrouesnel/prometheus-prefiller in order to record and backfill offline stored metrics on a newly created database.

promrec creates timestamped metrics on file. promplay generates a new Prometheus database based on these metrics files.

thypon on 2 Oct 2017

👍9 ❤2

@thypon - nice to see my 1 night hack become something a bit more polished (it also contains some pretty egrerious misunderstandings of the metrics engine I now realize too :)

wrouesnel on 30 Nov 2017

+1

geraybos on 22 Dec 2017

👎3 👍1

+1

Matty9191 on 8 Jan 2018

👎2 👍1

promplay solves import, but is there any solution for export? Something like pg_dump in Postgres?

Sure, Prometheus storage was not intended to be a long term format, but not having dump-to-text and restore-from-text as standard included tools is pretty bad.

remote_read was mentioned in the migration docs — I assumed it would eagerly read the whole database from the old instance and save all the data in the new one… looks like it doesn't :(

(I want to migrate a small but long-term database from 2.0 beta 4 to 2.1 release…)

myfreeweb on 18 Feb 2018

👍3

+1

theTibi on 19 Feb 2018

👎2

is there a plan for this ? Thanks.

amit-handa on 20 Apr 2018

👍2

It's also not backfilling data, which is when there's nothing older than it in the database.

@brian-brazil -- new to Prometheus, bit confused by this statement. Are you implying there's a way to backfill already, or that there's not and this request isn't going to resolve that?

The scenario I have is wanting to use Prometheus to monitor data from now->forward, but also would like a way to backfill the data prior to now so I don't lose my historical records. Without a way to specify metric timestamps I'm unsure of how to go about this. Not having historical metrics though seems like a deal breaker for anyone wanting to transition from an existing monitoring system to Prometheus.

calebtote on 9 May 2018

👍4

There is currently not, and this request is about that feature.

brian-brazil on 9 May 2018

Any new on this feature ?
A good example of usage : data analysis
I have done a bad histogram split of my data with the statsd_exporter ... so i would like to re-import my past raw data changing the format ...
The only way i have is to start an influxdb, push the data and define a dedicated connection to these data ... quite heavy process.
Would be so simple to re-generate the good metrics data using a Prometheus API :-)

omerlin on 16 Aug 2018

👍7

One more use case - I have to setup Grafana on my local computer and I need to imitate real data for this, but real data comes once per day...
So, it will take a lot of time to waiting for this. I want to load sample data into Prometheus, then setup Grafana dashboard, export it and save into Grafana config file. And then use it in production environment.

parserpro on 22 Aug 2018

👍4

+1

hanbaga on 30 Aug 2018

👎1 👍1

+1

freddy4711 on 30 Aug 2018

👎1 👍1

I would like to work on this. This will be done after https://github.com/prometheus/tsdb/issues/90 and https://github.com/prometheus/tsdb/issues/24 are addressed, and I am on them now.

codesome on 31 Aug 2018

🎉6

@parserpro thats nearly the same use case i have. We are thinking about bundling Prometheus/Grafana into our docker-compose product stack. For selling and demo reasons, its quite necessary to have demo data before any real data enters the system, which will be never the case on a demo notebook of a sales rep.

logemann on 11 Sep 2018

whats a realistic ETA for this feature to make it into a release?

narciero on 19 Sep 2018

@narciero
This PR https://github.com/prometheus/tsdb/pull/370 is required to be merged for bulk import. But as this is not a small change in TSDB which has potential of breaking things, it would take some time to verify and test and iterate on possible improvements.

You can safely assume that it will be at least 1-2 months (including the time for deciding on design of bulk import).

codesome on 19 Sep 2018

understood, thanks for the update!

narciero on 19 Sep 2018

@codesome why do you need prometheus/tsdb#370 for the bulk import?
Quickly reading trough the comments this relates to bulk imports to a new Prometheus server without any data in it so it will import data in order(ordered timestamps and nothing in the past) which should be possible even with the current tsdb package.

krasi-georgiev on 9 Nov 2018

If we are allowing bulk import, I think we need to support for every case and not only for empty storage. We need https://github.com/prometheus/tsdb/pull/370 to allow import of any time range.

But yes, valid point. We can do bulk import even with current tsdb packages, but we need to implement that part in prometheus/prometheus. I would like to do it after the above mentioned PR so that import is seamless.

codesome on 9 Nov 2018

Hi All,

I am new to Prometheus. I have read @brian-brazil's article on safari and I thought this post might be a good place to ask my question.

I have some sensor data with timestamps and other features (location etc) and I would like to insert these data to Prometheus using Python API, then connect with Grafana to visualize. It might be overshooting, but since I already have Prometheus as a Docker container, I thought I can use it as a DB to store the data. Can I do it? or do you advise to set up another DB to store the data then connect with Grafana?

I saw @thypon answer but, unfortunately, I don't know Go.

Sincerely

Guven

guvenim on 26 Dec 2018

We use github for bug reports and feature requests so I suggest you move this to our user mailing list.

If you haven't looked already you might find your answer in the official docs and examples or by searching in the users or devs groups.
The #prometheus IRC channel is also a great place to mix up with the community and ask questions (don't be afraid to answers few while waiting).

krasi-georgiev on 26 Dec 2018

Any news on this ? We want to migrate data from opentsdb into prometheus and it would be nice to have a way to import old data

jomach on 6 Feb 2019

we are actively working on prometheus/tsdb#370 and once implemented in Prometheus you could take blocks from another Prometheus server just drop them in the data folder and it will all be handled when querying and blocks will be merged at the next automated compaction trigger.

No strict ETA, but it looks like we might be able to add this to 2.8 which should be in about a month.

krasi-georgiev on 6 Feb 2019

@krasi-georgiev Just to be clear on your comment, is the expectation that you have to import from another Prometheus instance (but we still couldn't script our own bulk imports with epoch:data from other tsdbs)?

calebtote on 6 Feb 2019

Yes, after https://github.com/prometheus/tsdb/pull/370 is merged, I will be jumping directly into implementing bulk import.

@calebtote No, I think bulk import would support importing from non-Prometheus source too. The design has not been decided yet.

codesome on 6 Feb 2019

aaah yeah I missed that part about the opentsdb. Don't think we have done anything in this direction yet.

krasi-georgiev on 6 Feb 2019

@codesome I'm with @calebtot here. In our use case we have metrics on the opentsdb server we want to migrate to opentsdb and don't loose metrics

jomach on 7 Feb 2019

Hi I'm not sure if what I'm trying to do is supported or not yet.
In jenkins we use folders as teams/projects and subfolders as subprojects and then all the jobs are inside subfolders.
I have a python script which summarize the build status (SUCCESS, FAILURE,UNSTABLE), for all the jobs belonging to that team (root folder), that is done but now I want to also collect the timestamp of those metrics as I want to be able to see the builds status of each project selecting the period (year, monthly, weekly, daily).
Is that possible to do? I'm publishing all the metrics as gauge values.

This is an example of what I'm publishing for Prometheus:

HELP Jenkins_metrics_project_team01 Metrics by project

TYPE Jenkins_metrics_project_team01 gauge

Jenkins_metrics_project_team01{status="success"} 13.0
Jenkins_metrics_project_team01{status="failures"} 22.0
Jenkins_metrics_project_team01{status="unstable"} 10.0

HELP Jenkins_metrics_project_team02 Metrics by project

TYPE Jenkins_metrics_project_team02 gauge

Jenkins_metrics_project_team02{status="success"} 0.0
Jenkins_metrics_project_team02{status="failures"} 0.0
Jenkins_metrics_project_team02{status="unstable"} 0.0

cesarcabral on 7 Feb 2019

@cesarcabral Runs of batch jobs like that are usually tracked by encoding a Unix timestamp into the sample value (rather than the sample timestamp), see e.g. https://www.digitalocean.com/community/tutorials/how-to-query-prometheus-on-ubuntu-14-04-part-2#step-4-%E2%80%94-working-with-timestamp-metrics.

juliusv on 7 Feb 2019

@cesarcabral Runs of batch jobs like that are usually tracked by encoding a Unix timestamp into the sample value (rather than the sample timestamp), see e.g. https://www.digitalocean.com/community/tutorials/how-to-query-prometheus-on-ubuntu-14-04-part-2#step-4-%E2%80%94-working-with-timestamp-metrics.

Thank you Julius, nice articles by the way.

cesarcabral on 15 Feb 2019

Note: prometheus/tsdb#370 is merged. and there is #5292 to include it in prometheus.

roidelapluie on 3 Mar 2019

And with this project in GSoC, we can expect to have a better user-facing package for easy imports from other monitoring stacks.

codesome on 3 Mar 2019

So after some student implements this feature this(?) summer, it is reasonable to expect something along the lines of a /api/v1/admin/tsdb/insert_samples API call that I can call with a mapping of values like {timestamp1: value1, timestamp2: value2, ...} a series name and a mapping of {tag_name1: tag_value1, tag_name2: tag_value2, ...} tags?

MarkusTeufelberger on 5 Apr 2019

No, I'd expect this to be more a command line thing as it's messing with blocks.

brian-brazil on 5 Apr 2019

Ok, then no API call... but in general the use case of "I have some timestamped values and want to insert these into Prometheus at a certain series + with the following tags, I do something (API call, command line call...) and then I have them available in Prometheus" is what this student should code? I'm asking this because "Package for bulk imports" sounds to me like building yet another building block to add support for bulk imports, yet still not enabling users to do bulk imports.

As a side note, I'd also like to point out that for 4 years the name of this issue is "Add API for bulk imports"...

MarkusTeufelberger on 5 Apr 2019

❤4

@MarkusTeufelberger Good point, I renamed the issue to "Add mechanism to perform bulk imports".

juliusv on 5 Apr 2019

The goal is to allow for bulk imports, not to change Prometheus into a
push-based system. The implementation will likely not work with data in the
past few hours as they won't be on blocks yet.

On Fri 5 Apr 2019, 13:26 MarkusTeufelberger, notifications@github.com
wrote:

Ok, then no API call... but in general the use case of "I have some
timestamped values and want to insert these into Prometheus at a certain
series + with the following tags, I do something (API call, command line
call...) and then I have them available in Prometheus" is what this student
should code? I'm asking this because "Package for bulk imports" sounds to
me like building yet another building block to add support for bulk
imports, yet still not enabling users to do bulk imports.

As a side note, I'd also like to point out that for 4 years the name of
this issue is "Add API for bulk imports"...

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/prometheus/prometheus/issues/535#issuecomment-480256743,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGyTdjEcX6sQXSVBo4XcxpzsERsy1iBJks5vd0D1gaJpZM4Dhxb4
.

brian-brazil on 5 Apr 2019

I imagine the cli will just create a new block which you can just add in the data dir of the Prometheus server where you want to bulk import.
The data will be available after a Prometheus restart. and after the first compaction the overlapping blocks would be merged by removing the duplicated data.

krasi-georgiev on 5 Apr 2019

I presume we'll trigger a reload explicitly, rather than wait for one.

On Fri 5 Apr 2019, 13:56 Krasi Georgiev, notifications@github.com wrote:

I imagine the cli will just dump a new block which you can just add it in
the data dir of the Prometheus server where you want to bulk import.
The data will be available after a Prometheus restart. and after the first
compaction the overlapping blocks would be merged by removing the
duplicated data.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/prometheus/prometheus/issues/535#issuecomment-480265271,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGyTdnMRcDcUFpyUMhHcEpS1dDTZHee8ks5vd0gQgaJpZM4Dhxb4
.

brian-brazil on 5 Apr 2019

any progress？ able to import history would be great

zhxiaom5 on 29 Apr 2019

Phase 1 of GSOC is over, has someone taken up this task and how far along are they?

MarkusTeufelberger on 8 Jul 2019

Don't think anyone is working on this as part of the GSOC.

krasi-georgiev on 8 Jul 2019

😕1

I made a prototype for this issue about 1 year ago. The main idea is filling up the remote_write target from Prometheus text exposition format, skipping the Prometheus server. See:
https://github.com/pgillich/prometheus_text-to-remote_write

pgillich on 12 Jul 2019

there is a tsdb cli tool and would be fairly easy to make it read a file of some format and create tsdb blocks from it. Than these blocks can be added to another Prometheus server an will be merged at the next compaction.

krasi-georgiev on 13 Jul 2019

Maybe it work? If yes, how to do it?

We have some historical datas in other sub-system. So we want to migrate them to prometheus and use prometheus as our monitoring system, but those historical datas must not be discarded. If there is a certain way to allow us to import them into prometheus, that's really awesome. Or, we have no way to switch to prometheus.

xgfone on 18 Jul 2019

@xgfone as per my last comment feel free to open a PR to the tsdb cli tool https://github.com/prometheus/tsdb/tree/master/cmd/tsdb

krasi-georgiev on 18 Jul 2019

As far as i understand, https://github.com/prometheus/tsdb/tree/master/cmd/tsdb fills the internal Prometheus database, which has about 2 weeks retention time (by default).

I made a prototype in https://github.com/pgillich/prometheus_text-to-remote_write to fill up external databases, see the list of remote_write targets: https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage

pgillich on 18 Jul 2019

@xgfone I've created a PR that allows importing data that is formatted according to the Prometheus exposition format. https://github.com/prometheus/tsdb/pull/671

You could gather the data that you want to expose, and create an exporter that writes to a text file, appending timestamps, and then use this TSBD CLI utility to then import.

dipack95 on 5 Aug 2019

Can everyone here share what would he data format they want to use to import to tsdb?

json, Prometheus format (expfmt) or any other?

krasi-georgiev on 6 Aug 2019

👍1

@dipack95 tks. I will test it later for our case.

xgfone on 6 Aug 2019

I would find this very useful - it would enable me to backfill my reporting data from JIRA while at the same time writing my exporter tool for JIRA.

johncclayton on 27 Sep 2019

Can everyone here share what would he data format they want to use to import to tsdb?

json, Prometheus format (expfmt) or any other?

Prometheus format is fine I think as it's also what https://github.com/Cleafy/promqueen/ promrec is creating with some additional frame header (to store the timestamp) + http header

By the way, as I haven't seen that mentioned here as a use case. Think about on premise software recording / dumping metrics to disk every x seconds and for a support issue, the dumps are zipped and sent for further analysis.

On the receiving side those dumps can be bulk-imported into an empty Prometheus instance and then queried to find out what was going on at a specific point in time.

We've used promrecs format and promplay on our side to generate a 1.8 data dir and after that starting a dedicated Prometheus instance for that case.

svogt on 9 Oct 2019

Think about on premise software recording / dumping metrics to disk every x seconds and for a support issue, the dumps are zipped and sent for further analysis.

You could send the existing blocks around for that.

brian-brazil on 9 Oct 2019

Which would mean we would have to run prometheus on-premise. Quite some overhead.
Would need more RAM, we'd have to start and monitor an additional process, etc. pp.

Compare that to: we have a Java thread running which just dumps the registry every 60 seconds and appends that to a file (one file per day) - once the current day is done zip it and delete all files older than our customizable retention time...

Also we would need to update Prometheus and in case the storage format changes again, run two versions in parallel, etc. pp - managing that on >1000 installs and also on 5 different OSes (yes we have customers on AIX, HP-UX and Solaris as well) seems like quite some additional maintenance efforts compared to dumping to a file and later importing them...

svogt on 9 Oct 2019

Btw: Yes, I've already considered reimplementing the on-disk-format in Java and creating blocks from Java and sending them around :) Up to now I tried to avoid it, as promplay is working okay enough for us

svogt on 9 Oct 2019

In my case I have simple needs. I have a a JIRA system for which I've made
up a bunch of queries that make sense for my needs - all around the topic
of understanding how long it takes us to handle support cases or how long
dev teams are spending doing certain things.

My goal is to start monitoring these metrics - so it would be extremely
useful to take all the existing data (it's only 10 months or so in my case)
and see where the trends lie. Back filling is the key to making this
easily possible using Prometheus.

Data format - I don't have a preference, as long as there is great
documentation that shows a few samples I can make my data fit that. Having
said that - JSON seems a sane choice if the backfilled data is coming from
LOTS of different systems as it's a thing that most people understand and
there's no shortage of tooling in any programming language, which would
lower the barrier to entry somewhat.

johncclayton on 9 Oct 2019

for the moment we have already settled on the prometheus format(expfmt) so for any other formats will need to open a separate PR

krasi-georgiev on 9 Oct 2019

Fine with me - is there a way I can test the current features for
backfilling?

johncclayton on 15 Oct 2019

Next prometheus release will include this in the promtool or you can build this PR
https://github.com/prometheus/prometheus/pull/5887

krasi-georgiev on 15 Oct 2019

Ah, great! I've never built any of this so - lets see how far I get.

johncclayton on 15 Oct 2019

Hi, I believe this ticket can be closed, why we are keeping this?

While we can talk about different tools for different formats to develop in promtool or in prometheus-community org, the bulk import itself is already done via vertical compaction: https://github.com/prometheus-junkyard/tsdb/issues/90 Anyone can add an arbitrary block in TSDB local storage and Prometheus will use it.

I would say:

Let's close this issue as done.
Let's talk about improvements: I am working on some design doc about improvements.
We have still rule reveal as it's not yet available with any Prometheus community or promtool project (however there is https://github.com/yeya24/backfiller)
In a similar note please add separate issues for any missing format that might be missing for bulk import on top of expfmt we are building now. We can coordinate either contribution or help develop or add some doc page for the importers section where we can enumerate existing importers.

WDYT? (:

bwplotka on 13 Apr 2020

Why do you want to close this before #5887 is merged?

roidelapluie on 13 Apr 2020

Because https://github.com/prometheus/prometheus/pull/5887 is unrelated to this issue IMO.

We just created a custom file format that no one supports currently. In the same way, TSDB block was our format before. So this PR does not help with this issue more in any form. I think overall bulk import is done. There should be separate tracking issue for different importers like custom file format we are adding or CSV, or JSON etc.

I cannot see really any user asking for the custom file format, so why we are doing it? ): What am I missing?

I just want clarity on what is missing and what the community wants. (: I feel generic bulk import issue is not helpful. Bulk import in SOME way was done, so what we are waiting for here? (:

bwplotka on 13 Apr 2020

Hi, @bwplotka. I haven't been keeping up with the state of the project, since I wrote that PR, but if I understand correctly, you don't use the TSDB block format anymore?

dipack95 on 13 Apr 2020

I think overall bulk import is done.

Let's say I have data with timestamps from last week in a plain old boring CSV file that I want to import to prometheus 2.17.1 (the current latest release). What are the concrete steps (API calls, commandline tools...?) I need to take until I can query for it in a running server?

MarkusTeufelberger on 13 Apr 2020

@bwplotka this issue has 101 :+1: and vertical compaction is only part of the answer.

roidelapluie on 13 Apr 2020

@dipack95 we use. That's why this is our API.

@MarkusTeufelberger @roidelapluie that's the point, we should talk about concrete. The API described by @juliusv already exists so all those 101 👍 are satisfied unless we will finally talk about details on how to improve, ideally in separate issue with clear action items. (:

I have created a document with some details about the missing things, still in progress though. https://docs.google.com/document/d/15wKdeFntgUmEnV8kqQ_5iQEIhXnwU8HNm0NKlkR0xk4/edit?usp=drivesdk

Let's create another issues to talk about what's needed in separate threads, will look on that tomorrow. (:

bwplotka on 13 Apr 2020

No. I am NOT satisfied by just another few YEARS at this point of "well, it is anyways somehow maybe with squinting eyes possible to...". The issue here is good enough: There should be one easy way to ingest existing historic data into Prometheus. Initially it was thought to be a web API, then it was relaxed to be done via a command line tool. There isn't. You can't even point at a way to do it in a few pages of Google Docs. Not one example, not one proof of concept, nothing. Just a link to an issue in an aptly named "junkyard" organization and a blanket statement of "Upload arbitrary block with series in TSDB format into local TSDB storage.". No command on how to generate such a block from existing data, no link to documentation on how to import it.

Sorry, if this sounds angry, but this is a feature that I've been waiting for for a long time now and I refuse to believe anything other than a concrete set of steps to follow at this point. After all it was said years ago (in 2016) that this should happen and be implemented. Half a year ago #5887 was started. Still not merged. Now you want to close this issue and start a whole other discussion again including stuff like "must work as a service instead of CLI only"?!

MarkusTeufelberger on 13 Apr 2020

👍8

The last comment reads a bit like there was money paid in exchange for a contractual obligation to provide goods or services, which were not provided.

Aside from that I agree that the initial issue I described up top has not been solved yet. AFAIK there is neither an API (from the perspective of a normal user) nor a command in promtool or similar tool that a user can use to import data in a format that is common outside of Prometheus's own binary formats. Correct me if I'm wrong, but I'd also be for reopening this if my understanding is correct.

juliusv on 14 Apr 2020

I just have to say that I spent a decent amount of time on #5887, and while @krasi-georgiev helped me a TON by reviewing my PR multiple times, he can no longer review my PR, and as a result I have no clue what the current status is, or if it is even useful at all, based on @bwplotka's comments.

Of course, that tool means that there has to be some non-trivial setup by the end-user to import data that is not in a Prometheus compatible format, but I believe it is still very useful.

dipack95 on 14 Apr 2020

I totally see your anger @MarkusTeufelberger. However,.. did you do anything to help us (and yourself) to enable this? "Help others so they can help you".

Anyway you just helped with your comment (: This is exactly what we were missing here. Concrete feedback. That we are missing basic docs, basic examples etc.

That's why I started https://docs.google.com/document/d/15wKdeFntgUmEnV8kqQ_5iQEIhXnwU8HNm0NKlkR0xk4/edit#heading=h.u5547sl673xx, so please if you can, suggest us what would help.

Half a year ago #5887 was started. Still not merge

Good point. But to me, this is not solving any of your complaints. Does it? That's why there was no movement on it. And if you are interested in it, did you bother to even review that PR or comment that you want it? (: You know, any of those things would help us! :hugs:

@dipack95 Yes, and thank you for that. However, I don't think it was well coordinated and thought through. That's why you did not receive enough reviews and sorry for that. Let's think through WHAT would help us here. That's why I vote for closing this issue and starting ~4 new ones! With the exact task to do to enable bulk import use cases. I am happy to work towards that. Plus @dipack95 no work was wasted. I actually used your description several times to construct design etc Thanks!

@MarkusTeufelberger do you mind starting issue, with the description of perfect API you would like to see? Why CLI is not enough? Or if it's enough, what data format would be acceptable? From my point, I would suggest something "standard" like CSV or even remote write. WDYT?

bwplotka on 14 Apr 2020

I just need something - API, CLI one-time-use tool or longer running service, I don't really care at this point - where I can send a timestamp, a value and a few labels (ideally also: a list of timestamp:value mappings and a few labels) and have this data eventually show up in prometheus. I don't care about the data format (though it likely would be close to the one prometheus uses internally to minimize conversation losses), I don't care about the exact process, I just want to be able to import existing historic data into prometheus. It is somewhat absurd that it is easier to send this data to an InfluxDB instance and use the remote_read feature there instead of being able to use prometheus itself to add time series data to a time series database.

Currently it seems like I should use promtool or something else (tsdb?) to create a TSDB file somehow and then put that resulting file manually into the folder of a running prometheus server which then eventually would either merge this into its store on the next compaction or just leave it in there as part of its own database. I don't see any way to create TSDB files withpromtool or tsdb though at the moment. How else would "Upload arbitrary block with series in TSDB format into local TSDB storage" work and especially how are these TSDB formatted blocks created? The brute-force solution would by the way be to just write one index + chunk file per data line. It might even be viable for smaller volumes of data and if it gets cleared up by compaction anyways...?

Going on a slight tangent here, but maybe as an easier fix it could be possible to (optionally) fetch values from a remote_read endpoint that are still within the configured retention time via an API call. That way it would be possible to import historic data into more import-friendly systems like InfluxDB first, yet have the option of having that data end up in prometheus' database eventually (e.g. by issuing a "fetch" API call that queries + imports all data from a certain set of labels within retention range). That way the "prometheus is pull only!" directive wouldn't be violated and most existing components could already be reused. Ideally there would be a way to write TSDB files directly, but that seems to not go anywhere soon, has limitations (pre-sorting, no defined formats yet...) and would just replicate efforts that other time series databases also already have done on a more general level (I don't need to pre-sort samples for import into Influx for example as far as I know).

MarkusTeufelberger on 15 Apr 2020

👍1

Please note that influx is a general purpose time series database and Prometheus is a monitoring solution.

It seems that your try to import data at a frequent interval, which is not the goal of this issue. This issue is about one-off import of monitoring data.

roidelapluie on 15 Apr 2020

Yes, @MarkusTeufelberger it sounds like a different use case indeed. Can we hear more about the use case of yours? Why you have historic data somehow that often? It looks rather like a case for a good exporter instead of periodic imports. (:

bwplotka on 15 Apr 2020

For the record, I disagree that Prometheus is only a monitoring framework (not solution) at this point. Personally, I use it for analysis and historic reference more often than alerting.

Furthermore, I would argue that our users do not need to justify their specific use of Prometheus.

That being said "Power, cooling, and availability data in a datacenter" would be a good example. This data goes back years, decades even. If you buy a new existing site or migrate an old one, you need to import data and you would usually do this in batches, not in one huge run.

RichiH on 15 Apr 2020

👍8 ❤1

My use case is importing historic data once. To be exact it would be data extracted from blockchains like Bitcoin that are still going on (so new data is ingested live) but that have already some history that I also want to be able to query. I might need to import a different data set as a one-off for a different system, but not the same data for the same system regularly. I do NOT need to import "often" or "at frequent intervals" and I don't think I ever said that I do? I would even not care if the import takes days or weeks to complete in case it is so inefficient that I have to import the data as single samples instead of blocks of data as long as it is possible at all.

MarkusTeufelberger on 15 Apr 2020

Data point: I work on clangd, which is a (non-network) server used by editors.
At work, we monitor instances with an internal prometheus-like system.

We'd like to provide something for open-source users to do ad-hoc analysis of their sessions. It needs to have few moving pieces (no network dependencies, OS-specific setup etc).

Our data model looks a lot like the prometheus exposition format, and we could easily write this format with timestamps. However lacking a way to import it, there seems to be little point.

Our current plan is to write CSV of all recorded events, and direct people to analyze it in excel.
If an import tool existed, I think we'd support prometheus exposition format and recommend using that instead.
(If the tool could watch a text file for appends (like tail -f) then it could even be real-time, which would be neat)

sam-mccall on 20 Apr 2020

@sam-mccall That sounds like you're looking for an ongoing push solution, which is not what this issue is about. This issue is about one-off bulk imports.

brian-brazil on 20 Apr 2020

I understood! Prometheus told us one thing: it only supports the PULL mode, not PUSH, forever! For any proposal about PUSH, it will be rejected to accept.

PULL or PUSH? I follow @RichiH .

xgfone on 20 Apr 2020

I would not say rejected, but rather enabled somewhere else and letting Prometheus focus on what it is doing best, scraping/pulling metrics (:

For push, you have tons of Prometheus-based systems like Thanos or Cortex both are reusing lots of code (e.g PromQL) of Prometheus and enabling almost the same HTTP read APIs. Both of those do not pull metrics but allows to push them e.g via Prometheus, but anyone else can push as well.

So instead of having single projects being one-tool-for-everything (and thus for nothing, because it's almost impossible to make something efficient that is fully generic), you have others that integrate well with each other. :hugs:

Anyway, for this issue, we want to solve the exact case of "ups I forgot to add this data to Prometheus (or Thanos or Cortex)" or "let's migrate my old data from DB X to Prometheus/Thanos/Cortex)"

bwplotka on 20 Apr 2020

❤1

I don't think that pulling and scraping metrics is what prometheus is doing best, for the record. Most exporters don't even support authentication and prometheus is also relatively inflexible when it comes to scrape configuration (e.g. only fixed intervals instead of dynamic ones, only text format supported, no parameter support in official libraries...). The thing I like about prometheus is the data analysis and alerting part as well as the relative flexibility when it comes to labels etc. (but that comes with a price as well of course) making it more suitable for monitoring dynamic infrastructures or dynamic portions of static infra. I couldn't care less if these data were pushed or pulled into the server before they show up there, as long as they are somewhat reliably showing up.

The minimal viable product I'd like to see is something (tsdb write ...?) to write a single metric at a single point in time with a single set of labels to a database file. As far as I understand it, prometheus compaction can take it from there. No need for sorting, no need for multiple metrics, no need for any custom format, just a couple of command line parameters in an existing tool. Probably won't be able to import second-by-second data of every rack in a datacenter of the past decades within fractions of a second, but for importing a few thousand measurements it should be plenty enough. Everything else (e.g. a format to encode value:timestamp mappings to import more than one value or a way to import different values into different sets of labels) can come later after discussions, design documents etc.

MarkusTeufelberger on 20 Apr 2020

I don't think that pulling and scraping metrics is what prometheus is doing best, for the record. Most exporters don't even support authentication and prometheus is also relatively inflexible when it comes to scrape configuration (e.g. only fixed intervals instead of dynamic ones, only text format supported, no parameter support in official libraries...).

Arguable, but anyone can have its own opinion, thanks for sharing. (:

The minimal viable product I'd like to see is something (tsdb write ...?)

Yea, that makes sense. Implementation-wise I can see a couple of problems, but overall generally it looks similar to what I would see as a client (or even embedded client) to promtool receive which would receive simple remote-write API requests. (:

Overall our survey started at https://github.com/prometheus/prometheus/issues/7119 is finishing, and remote write is the highest voted requirement. The next one is CSV. I will sum up results and we will definitely have a large discussion among maintainers soon (dev summit) to decide what to do next, but looks like CSV and something like tsdb write and promtool receive are plausible solutions.

bwplotka on 20 Apr 2020

@sam-mccall That sounds like you're looking for an ongoing push solution, which is not what this issue is about. This issue is about one-off bulk imports.

@brian-brazil Nope, we'd be happy with one-off bulk imports - run a session, import it, analyze it.

Ongoing push would be nice-to-have and seems related, but certainly not needed.

sam-mccall on 20 Apr 2020

I wouldn't consider it to be one off if you're doing it regularly after every session.

brian-brazil on 20 Apr 2020

I have a set of devices that handle network traffic and generate counter information as traffic information with timestamp.
This data becomes available periodically (once a few hours) to be able to provide to prometheus and not in real-time.

Flexibiliy of labels and promql is the key reason to use prometheus.

A means to feed these counters into prometheus with old timestamp is necessary to make such use-case work. I did go through the google doc that is referred to in the earlier discussions and the links referenced in it. I could not find a mechanism to address it.

It is OK if it is via web api or cli, if all counters must be grouped for their timestamp or can be a mix OR directly in tsdb format. Having "some means" is what I am trying to figure out.

shivakumargn on 30 Apr 2020

I've been recently looking for a solution to instrument a web application; I thought about using Prometheus and eventually stumbled upon this issue. I needed to put timers around certain pieces of code and was willing to push this data into a Prometheus-like system (multiple timers from each HTTP call). Having one value per scraping doesn't really help here; neither histogram nor summary look like a good fit, either: the former needs absolute boundaries, the latter can't be aggregated and is calculated on the client, I rather needed a way to have every point from all over the cluster on a server and run occasional aggregation there.

I'll be considering VictoriaMetrics: at a glance it fills the gap for me. Maybe this hint will also help someone who goes down a similar path.

Well, I hope it doesn't look offensive to post links to other solutions here :roll_eyes: Even though the issue is about bulk loading, as I understood from the discussion here, I'm not the only one who got here looking for a way to push values into a time series.

kamazee on 30 Apr 2020

@kamazee That is a general usage question, you'd be best asking it on the prometheus-users mailing list as it sounds like you're looking for something that is not a metrics system at all.

brian-brazil on 30 Apr 2020

Closed by https://github.com/prometheus/prometheus/pull/8084

codesome on 26 Nov 2020

Prometheus: Add mechanism to perform bulk imports

Most helpful comment

All 112 comments

HELP Jenkins_metrics_project_team01 Metrics by project

TYPE Jenkins_metrics_project_team01 gauge

HELP Jenkins_metrics_project_team02 Metrics by project

TYPE Jenkins_metrics_project_team02 gauge

I would say:

Related issues