Fan Curves and Chip Temps
Most ASIC miners do not die suddenly, they warn you in telemetry for days or weeks first. If you can read fans and temperatures properly, you can catch the warning signs before you lose hashrate or cook a board.
This article shows you how to interpret fan curves, PCB vs chip temperatures, and common telemetry patterns that predict failure. By the end, you will have a repeatable workflow you can run from a dashboard screenshot or an API dump, and you will know when to act, what to check first, and what not to touch.
Note for South Africa:
- Hot inland afternoons make intake temperature the hidden variable, log it with a separate thermometer, not only what the miner UI reports.
- Dust, pollen, and coastal salt accelerate heatsink clogging and connector corrosion, both can distort temps and create intermittent fan faults.
- Load-shedding restarts can create misleading boot-time fan and temp spikes, always correlate telemetry with restart logs and timestamps.
At a glance:
- First, decide if you are near an overheat shutdown limit for your specific miner and firmware, if yes reduce power or shut down and investigate.
- If fans are at max but temps still climb, treat it as an airflow or intake temperature problem before you blame the hashboards.
- If one board is consistently hotter, focus on per-board comparison, fan swap tests, and physical contact issues, not room ventilation.
- If temps look fine but HW errors rise, you are in an instability zone, check logs, voltage and frequency changes, and PSU quality before you chase cooling.
Key takeaways:
- Chip temp and PCB temp are not interchangeable, the gap between them is a diagnostic signal.
- Fan percentage is not proof of airflow, RPM and temperature delta tell the real story.
- Good troubleshooting starts with the safest checks, then escalates only when telemetry points to invasive work.
What miner telemetry actually is (UI readings, logs, and API)
Telemetry is any measurement your miner reports about itself, typically temperatures, fan speed, voltage, frequency, hashrate, and error counters. You usually see it in the web UI, but the same data may also exist in logs and in an API endpoint.
For troubleshooting, treat the UI as a convenience layer, not a ground truth. Different firmware builds can rename fields, hide per-board readings, or average values in ways that mask the real problem. If you can, capture three things together, a screenshot, a log snippet, and an API snapshot.
- UI: fast to scan, but sometimes rounds values or hides outliers.
- Logs: best for boot events, thermal throttling messages, chain dropouts, and repeated restarts.
- API: best for automation, trend graphs, and verifying per-board telemetry fields via cgminer-compatible commands.
If you are monitoring at scale, standardise your capture bundle, timestamp, miner model, firmware version, ambient intake temperature reading, and a 5 to 10 minute trend of temps and fan RPM.
On Braiins OS, the REST interface is documented, which helps you separate what is configurable from what is merely displayed in the UI via the Braiins OS telemetry endpoints Braiins OS cooling and telemetry API documentation.
Temperature sensors explained, intake, exhaust, PCB temp, chip temp, and what each one tells you
ASIC cooling is about moving heat from the silicon to the air, then moving the hot air out of the room. Telemetry gives you multiple temperature vantage points, some are about the room, others are about the board, and others are about the chips.
Start by separating air temperatures from component temperatures. Air temperatures tell you if the environment and airflow path make sense. Component temperatures tell you if the heatsink contact and heat transfer are healthy.
| Reading | What it usually represents | What it is best for | Common gotcha |
|---|---|---|---|
| Intake or inlet temp | Air entering the miner | Detecting hot rooms or recirculation | Some miners do not measure it, add your own probe |
| Exhaust temp | Air leaving the miner | Checking heat removal and delta | Can be skewed by duct leaks |
| Temp(PCB) | Board-level sensor temperature | Spotting board hotspots and sensor plausibility | May lag chip spikes, do not treat as chip safety limit |
| Temp(Chip) | Chip or chip-adjacent sensor values | Throttling risk and per-chain outliers | Firmware naming differs, confirm the field meaning |
Intake temperature is the baseline. Without it, you cannot tell if high chip temps are caused by bad thermal contact or simply hot air. In South Africa, garages and outbuildings often recirculate hot exhaust back into intake during wind shifts, so a cheap probe at the intake grille is worth more than another dashboard widget.
Exhaust temperature is most useful as a relative metric. Watch the inlet-to-exhaust delta over time and compare miners in the same aisle. A sudden delta change often points to airflow restriction, fan degradation, or a ducting leak.
PCB temperature and chip temperature tell different stories. A bigger-than-usual gap between chip and PCB readings can indicate poor heatsink contact or degraded thermal interface, while a small gap with high absolute values can indicate that the whole board is heat soaked. For a repair-aware look at how some Antminer hashboards separate PCB and chip sensing paths, see this S17e repair-oriented sensor explanation Antminer PCB and chip temperature sensors explained.
Do not quote temperature shutdown numbers from random charts. Overheat and throttling thresholds are model-specific and firmware-specific, confirm them in your miner logs, vendor documentation, or your own observations under controlled conditions.
Fan telemetry explained, RPM vs percentage, fan count, and what normal variation looks like
Fan telemetry is easy to misread because many dashboards show a percentage that looks precise. In practice, the reliable signal is fan RPM per fan, plus how those RPM values change as temperatures move.
RPM is what the fan tach signal reports. Percentage is often a control output, not a measurement, and two firmware stacks can map 60 percent to very different RPM. When you are diagnosing, always ask, what is the miner commanding, and what are the fans actually doing.
- Normal variation: a few hundred RPM difference between fans can be fine, especially if one fan sees higher backpressure.
- Concerning variation: one fan consistently much lower, oscillating hard, or dropping to zero intermittently.
- Fan count mismatch: if the telemetry expects two or four fans but reads fewer, treat it as a control or wiring issue first.
If you are pulling telemetry via a cgminer-compatible API, document which command you use and what fields you rely on. LuxOS documents the cgminer stats command fields, including fan-related values, which is helpful when you are standardising data collection across fleets via cgminer-compatible miner telemetry cgminer stats fields for fan and temperature.
Also watch for a mismatch between fan RPM and sound. A miner that sounds like a jet but reports low RPM may have a reporting fault. A miner that reports high RPM but sounds quiet can indicate a stalled fan, a broken blade, or a tach signal issue.
Reading the fan curve, how automatic fan control targets temperature and what causes oscillation
A fan curve is the relationship between temperature and fan speed as your firmware tries to maintain a target. In automatic modes, the control loop will increase fan speed when a monitored temperature rises, then reduce it when temperatures fall.
Oscillation is the common failure mode of that control loop. You see it as fans ramping up and down in repeating cycles, with temperatures swinging in sync. Sometimes it is harmless, but sometimes it indicates unstable airflow, recirculation, or a sensor reading that jumps.
- Airflow instability: intake air temperature changes rapidly, for example a door opening, a extractor fan cycling, or wind pushing exhaust back into intake.
- Backpressure changes: ducting, filters, or improvised mufflers create a non-linear restriction, so small changes in fan output cause big changes in airflow.
- Sensor noise or misreads: a stuck or intermittent sensor makes the controller chase ghosts.
If you are running LuxOS, be careful with assumptions about dynamic fan behaviour. LuxOS documents how temperature and fan settings work, including cases where fan speed may not be adjusted dynamically, which can make a miner look stable until it is not via the LuxOS cooling documentation LuxOS temperature and fan settings.
On stock Bitmain firmware, older manuals explicitly describe fan speed control being linked to temperature when a manual option is not forced. Even if you are not using an S7, the manual is a useful reminder that a single checkbox can change the whole meaning of your fan telemetry, see the Bitmain manual note on temperature-based fan control Antminer stock firmware fan speed depends on temperature.
Common mistakes
- Using a single max temperature number and ignoring per-board spread.
- Assuming fan percentage equals airflow, without checking RPM and intake temperature.
- Chasing chip temps when the real issue is recirculation and hot intake air.
- Forcing manual fan speed and then forgetting it is forced during later troubleshooting.
- Ignoring logs after load-shedding restarts, especially boot-time fan tests and chain init failures.
If you’re new
- Start with trend graphs, not single readings, capture at least 10 minutes of temps and RPM.
- Add one independent intake air thermometer at the miner inlet, then log it daily.
- Learn your normal per-board temperature spread when the miner is healthy.
- Write down your firmware version, because telemetry field names and behaviours change.
- Make one change at a time, then observe, do not tweak frequency, voltage, and fan settings together.
If you have done this before
- Standardise your snapshot bundle, UI screenshot, logs, API output, and ambient intake reading.
- Track chip-to-PCB delta per board over time, it is often more predictive than absolute numbers.
- Look for oscillation patterns around ducting changes and seasonal intake shifts.
- Validate fan tach signals with a swap test before you blame a control board.
- Keep a known-good fan pair and a known-good PSU for A, B isolation testing.
The 6 telemetry patterns that predict trouble (before the miner hard-fails)
Most thermal failures show up as patterns, not one-off spikes. Your goal is to recognise which subsystem is most likely at fault, airflow, fans, thermal interface, sensors, firmware control logic, or power stability.
Below are six high-signal patterns that advanced operators use. You can detect most of them without opening the miner, which matters when you are trying to avoid turning a minor airflow issue into a broken connector.
- Pattern 1: rising chip temp with flat fan RPM.
- Pattern 2: one board hotter than the rest.
- Pattern 3: high fan RPM but worsening temps.
- Pattern 4: stable temps but rising HW errors.
- Pattern 5: implausible sensor values, stuck, zero, or sudden jumps.
- Pattern 6: repeated boot-time fan ramps after power events, with delayed stabilisation.
Pattern 1, rising chip temp with flat fan RPM (airflow restriction or bad fan signal)
If chip temperatures climb while fan RPM stays flat, the miner is not responding to heat the way you expect. That can mean the firmware is not in automatic mode, fan control is locked, or the miner is not receiving a valid temperature signal to trigger a response.
It can also mean airflow has collapsed even though fans are spinning, for example a blocked intake, a collapsed duct, or a filter that has loaded up with fine dust. In coastal areas, salt-laden dust can build a sticky mat that looks minor from the outside but chokes fin stacks inside.
- Confirm fan mode is not manually forced and that automatic cooling is enabled for your firmware.
- Check for obvious restrictions, intake grille blockage, collapsed flex ducting, or a new noise muffling attempt.
- Swap fans left to right, if the symptom moves with the fan, you have a fan or tach problem.
Pattern 2, one board hotter than the rest (thermal paste, heatsink contact, sensor, or partial chain issues)
Per-board outliers are your best clue that the room is not the main problem. If one hashboard runs consistently hotter by a meaningful margin while intake temperature is stable, suspect thermal interface degradation, uneven heatsink contact, or a local airflow bypass across that board.
Sensor faults can mimic this too. A stuck sensor may show a board that is permanently hot or permanently cool, which can trick the control loop into over-fanning or under-fanning.
- Compare PCB and chip readings for that board, look for an unusual delta.
- Check if the board has lower hashrate or more HW errors at the same time.
- Reseat connectors only if you are confident, follow manufacturer handling guidance to avoid damage via Bitmain do’s and don’ts for Antminer operation manufacturer guidance on overheating and handling.
If your firmware exposes per-board temperatures via cgminer commands, use them rather than a single max number. LuxOS documents a dedicated temps command that illustrates per-board reporting, and it also notes that fields can vary by model, which is your cue to validate before you build alerts via the cgminer temps command per-board temperature telemetry.
Pattern 3, high fan RPM but worsening temps (intake too hot, recirculation, ducting leaks, clogged filters)
If fans are screaming and temperatures still climb, the miner is doing its part but the air is not. The two most common causes in home and small farm setups are hot intake air and recirculation, especially when exhaust air does not have a clean path out of the building.
In South Africa, this pattern often peaks during late afternoon heat, or during a load-shedding return when multiple miners restart together and dump heat into a small room. It can also happen when you add ducting and accidentally create a leak that pulls exhaust back into the intake side.
- Measure intake air at the miner, not across the room.
- Check for short-circuiting paths, hot air rolling along the ceiling back to the intake.
- Inspect filters and fin stacks for dust mats, do not assume a quick external wipe is enough.
- Review your room layout against a known separation approach, see our mining room ventilation layouts separating hot and cold air paths.
Pattern 4, stable temps but rising HW errors (chip instability, undervolt, PSU ripple, or marginal chips)
Stable temperatures do not guarantee stable hashing. If HW errors rise, rejected shares increase, or hashrate droops while temperatures look fine, you are likely dealing with electrical or silicon margin issues rather than cooling alone.
This can show up after firmware tuning, after a PSU change, or after repeated hard power cuts. Load-shedding can contribute because frequent cold starts stress marginal components, and some systems may come back on with voltage dips if your upstream power is unstable.
- Check logs for throttling, frequency step-downs, or chain disable events.
- Confirm you did not recently change undervolt or frequency settings.
- Inspect PSU connections and look for signs of heating at connectors, only when powered down and safe.
If you use fleet tools, note that some platforms expose protective settings like forced fan speed or chain disable thresholds. Awesome Miner documents Antminer-related configuration concepts that can affect how telemetry behaves via a manual fan speed override setting monitoring software temperature protection settings.
Pattern 5, implausible sensor values (stuck, zero, sudden jumps)
Implausible readings are those that do not track reality, for example a chip temp stuck at one number for hours, a PCB temp reading zero, or a sudden jump of tens of degrees with no matching fan response. Treat this as a data integrity issue until proven otherwise.
Your verification process should use cross-sensor comparison first, then logs, then physical measurement. The goal is to avoid opening the miner for a fake problem caused by a reporting glitch or a loose sensor input.
- Compare the suspect sensor to other boards and to intake and exhaust behaviour.
- Reboot only if you have a controlled window, then compare pre and post reboot readings.
- Use an external thermometer or thermal camera spot check if available.
Pattern 6, repeated boot-time fan ramps after power events (load-shedding signature)
Many miners run fans hard during boot, then settle into their control loop after sensors initialise and hashing starts. After load-shedding or inverter cutovers, repeated restarts can create a confusing telemetry trace, fan ramps, then a slow climb in chip temps as the room heat soaks.
When you see this, correlate with power event timestamps. Your fix may be operational, stagger restarts, improve intake flow, or avoid hot restarts that occur when the room is already heat soaked.
- Check if multiple miners restart together and spike room temperature.
- Review boot logs for chain init issues that leave one board idle or erroring.
- Consider staged power-up and better airflow separation rather than higher fan limits.
What to do, step-by-step response plan (from safest checks to invasive work)
Use this troubleshooting flow when you have a telemetry snapshot and you need to decide what to do next. It prioritises safe, external checks first, then moves toward swaps and only then toward opening hardware. If you are running a farm, write this into your standard operating procedure so every technician follows the same escalation path.
- Safety and limits check: Identify which temperature reading is your control variable, and whether any reading is approaching your miner model and firmware protection limits. If you are close, reduce power or shut down and let it cool before further checks.
- Confirm control mode: Verify whether fan control is automatic or manually forced. Record firmware version and any recent changes, because cooling logic can change between versions, and release notes can matter for diagnosis via Braiins OS release notes on fan behaviour firmware version changes thermal control.
- Measure intake air: Take a physical intake temperature reading at the miner. If intake is high, fix the room and airflow path before touching the miner.
- Check airflow path: Look for clogged filters, blocked grilles, collapsed ducts, recirculation paths, and duct leaks. If you need layout ideas, review our mining room ventilation layouts preventing recirculation in a home mining room.
- Fan verification: Compare per-fan RPM values, listen for abnormal bearings, and do a fan swap test if one fan looks suspect. If the low RPM follows the fan, replace that fan.
- Per-board comparison: Use per-board temps and errors to identify outliers. If one board is consistently hotter, suspect thermal interface, heatsink contact, or local airflow bypass inside the chassis.
- Log-driven checks: Search logs for throttling, frequency drops, chain disable events, and sensor read errors. Capture the exact messages and timestamps before you reboot.
- Only then consider invasive work: Power down, discharge, and follow handling guidance. If you are not repair-equipped, stop at visual inspection and connector reseating, and escalate to a repair tech.
If you want a second set of eyes on telemetry patterns, send a screenshot plus a 10 minute trend and your intake temperature reading to our team via the contact page contact Sell Your PC.
South Africa lens, heat waves, dust, coastal corrosion, load-shedding restarts, and noise constraints
South African conditions add two practical complications, hotter intake swings and more frequent dirty air events. Both make miners appear inconsistent, when the real issue is that the environment is inconsistent. Your telemetry interpretation improves fast when you add one independent intake thermometer and you log it alongside miner-reported values.
Dust and pollen load up heatsinks and filters faster than many people expect, especially in garage mining. Coastal corrosion is a separate enemy, salt air can attack connectors and create intermittent faults that show up as fan tach dropouts, chain flaps, and sensor noise.
- Heat waves: Watch intake temperature first, then chip temps, then fan RPM. Do not treat high fan RPM as a solution if intake is already too hot.
- Dust: Use gentle, controlled cleaning methods and avoid forcing debris deeper into fin stacks. If you are unsure, get professional help via our professional services ASIC and IT support services.
- Coastal zones: Inspect connectors and fan plugs more often, and watch for intermittent readings.
- Load-shedding: Correlate restarts with telemetry, repeated restarts can create heat soak and misleading oscillation. Consider power resilience planning, and keep your inverter and power path healthy, our inverter repair services may help if you are chasing unstable cutovers professional inverter repairs.
- Noise constraints: Do not box in miners or add restrictive muffling without measuring intake temperature and exhaust behaviour first.
If you are expanding a small setup, make sure your airflow plan and miner selection align. You can compare different miner categories in our shop sections for BTC ASIC miners bitcoin ASIC miners and hydro or immersion units hydro and immersion miners, then plan telemetry thresholds around the cooling method.
Frequently asked questions
Which temperature should I alarm on, PCB temp or chip temp?
Alarm on what your firmware uses for protection and control, which is often chip-oriented, but validate it per model and firmware. In practice, track both and alert on abnormal deltas between chip and PCB readings, because that delta can flag heatsink contact problems before absolute limits are reached.
My fans are at 100 percent but temps still rise, does that mean the miner is faulty?
Not automatically. It often means the air is too hot or not moving, due to recirculation, blocked intake, duct leaks, or clogged fin stacks. Measure intake temperature at the miner and check the airflow path before you open the chassis.
Why do fan RPM numbers look different across firmware and dashboards?
Fan RPM is a measurement, but dashboards can average it, hide per-fan values, or display percent output instead. Firmware can also change which sensors drive the control loop, and how target temperature is enforced, so record firmware version when comparing readings.
How do I tell if a temperature sensor is lying?
Look for implausible behaviour, such as a sensor stuck at one value, sudden jumps with no matching fan response, or values that do not align with intake and exhaust behaviour. Cross-check with other boards, check logs for sensor errors, and if possible verify with an external measurement before you replace parts.
How often should I clean miners in dusty South African environments?
Use telemetry to set your interval. If you see a gradual rise in chip temps at the same intake temperature, or fans trending higher RPM for the same load, cleaning is due. Avoid aggressive methods that can damage connectors or push debris deeper, and if you are not confident, use a professional service.
Short summary
- Log intake air temperature separately, it is the baseline for every other reading.
- Use per-board comparisons and chip-to-PCB deltas to spot early thermal interface problems.
- Treat fan percentage as a control signal, rely on RPM plus temperature trends for diagnosis.
- When fans max out but temps climb, fix airflow and recirculation before you blame hashboards.
- Correlate telemetry with load-shedding restarts and logs to avoid chasing false patterns.
This is educational content, not financial advice.