Showing posts with label tuning. Show all posts
Showing posts with label tuning. Show all posts

29 May 2019

Tuning G1 GC for Cassandra

Tuning G1 GC for Cassandra is too complicated, but it can make a big difference in cluster health.

Symptoms:

  • High p99 read/write latencies (because of long GC pauses)
  • High CPU causing lower read throughput (because of low GC throughput)
  • Dropped mutations (because of full GC collections on write-heavy clusters)
Here are some options that made a difference for me:
  • JVM: options for getting GC details out for inspection
    -XX:+PrintGCDetails
    -XX:+PrintGCDateStamps
    -Xloggc:/var/log/cassandra/gc.log
  • JVM: options for having enough buffer for collections
    # Pre-allocate full heap
    # Pre-size new size for high-throughput young collections
    -Xms24G
    -Xmx24G
    -Xmn8G
  • JVM: options for avoiding longer pauses (do reference scanning concurrently with app)
    # Have the JVM do less remembered set work during STW, instead
    # preferring concurrent GC.
    -XX:G1RSetUpdatingPauseTimePercent=5
    # Scan references in parallel to avoid long RSet scan times
    -XX:+ParallelRefProcEnabled
  • JVM: options for better young collection throughput (avoid copying short-lived objects)
    # Save CPU time by avoiding copying objects repeatedly
    # Improve collection throughput by making heap regions larger
    -XX:MaxTenuringThreshold=1
    -XX:G1HeapRegionSize=32m
  • JVM: option cocktail to reduce risk of long mixed collections
    # Avoid to-space exhaustion by starting sooner, capping new size, and being more aggressive during mixed collections
    -XX:InitiatingHeapOccupancyPercent=40
    -XX:+UnlockExperimentalVMOptions
    -XX:G1MaxNewSizePercent=50
    -XX:G1MixedGCLiveThresholdPercent=50
    -XX:G1MixedGCCountTarget=32
    -XX:G1OldCSetRegionThresholdPercent=5
    # Reduce pause time target to make mixed collections shorter
    -XX:MaxGCPauseMillis=300
  • JVM: option to get extra buffer for use in allocation emergency
    # Reserve extra heap space to reduce risk of to-space overflows
    -XX:G1ReservePercent=20
  • JVM: options for top collection throughput during pauses
    # Max out the parallel effort during pause
    # Set to number of cores
    -XX:ParallelGCThreads=16
    -XX:ConcGCThreads=16
  • Cassandra: option to avoid excess spikes of garbage from compaction
    # Reduce load of garbage generation & CPU used for compaction
    compaction_throughput_mb_per_sec: 2
  • Cassandra: option to aggressively flush to disk on write-heavy clusters
    # Reduce amount of memtable heap load to reduce object copying
    memtable_heap_space_in_mb: 1024  # instead of default 1/3 heap
The net effect of the above combined settings is as follows:
  • for a read-heavy cluster on i3.4xlarge:
    • young collection p90 pause times around 50ms
    • mixed collection p90 pause times around 90ms
    • no Full GCs, no dropped mutations
  • for write-heavy clusters on r5.2xlarge:
    • young collection p90 pause times around 175ms
    • mixed collection p90 pause times around 175ms
    • no Full GCs, no dropped mutations
Tuning process:
  1. Turn on GC logging
  2. Gather pause times for young collections, mixed collections, and any full collections
    • get logs for at least 2-3 cycles of young => mixed/full transitions
  3. Decide which of the above you want to optimize for, pick a single set of settings
    • Apply the settings to one node on one rack
    • Decide whether it had the desired effect
    • Tweak and repeat on single node until you get to a stable point
  4. Apply settings to all nodes on one rack
    • Wait for a peak traffic period or apply stress
    • Compare results from non-tuned racks with the tuned rack
    • Tweak and repeat on single rack until settings are rock solid
  5. Apply settings to full cluster
    • Wait for a peak traffic period or apply stress
    • Make sure settings are rock solid for full cluster
  6. Start again on step 2 until you have nothing left to tune