I would have to check the documentation again but what I remember is that carbon-cache daemon literally caches all incoming data for the duration of the flush-interval. When it writes the data to graphite, the whisper library handles the incoming data and aligns the data if there are multiple values for the same key within timestamp_now - (timestamp_now - interval).
So, if you have an interval of 60 seconds, just one point will be written per minute.
But there are a few exceptions for example if the carbon-cache daemon flushes the data more often than that. Therefore you can rely on the deduplication by graphite but there are a very few exceptions.
Here is what our prod system looks like:
- 2 Icinga master nodes
- 2 nodes with Graphite and Grafana
Both icinga nodes have a local carbon-relay running. carbon-relay allows you to specify multiple targets. So the relay-daemon knows of both Graphite-Nodes and duplicate the data so that both nodes receive the same data. They also cache the data for a limited time in case of outages or short network interruptions. If you experience larger outages, network partitions or other events that prevent the relay to send the data to the carbon-caches you should take a look at the utilities from carbonate: https://github.com/graphite-project/carbonate
They are really good and useful if you want to align the data across graphite instances, backfill or migrate data for example.
Edit: This is just my recommendation if you do not need or do not have the resources to have a whole full blown graphite cluster.