How Much Accuracy Do You Need?¶

Sou-Cheng Choi (with some edits by Fred Hickernell)

May 4, 2026

For a compact, recipe-style walkthrough of the same resume workflow, see resume_examples.ipynb. This notebook is the extended discussion version with context, motivation, and detailed diagnostics.

Art Owen's Reflections on Lyness and the Accuracy Question¶

This notebook explores a fascinating discussion about the accuracy requirements for numerical integration, particularly in the context of automatic quadrature routines. It's based on a conversation between Art Owen and Fred, highlighting the challenges of translating scientific needs into quantifiable accuracy targets.

The central theme revolves around how a scientist determines the desired accuracy of a numerical integration result. Let's examine three common responses:

Case A: The "Plenty of Time" Response

Response: "I would like 8-figure accuracy. I have quite enough computer time available for this."
Explanation: This is a relatively rare response. It indicates a situation where the scientist is primarily concerned with the result itself, rather than the computational cost. It's suitable for small problems where the time spent on achieving high accuracy isn't a significant constraint. Automatic quadrature routines are designed to handle such scenarios.

Case B: The "Time-Constrained" Response

Response: "I need at least 4-figure accuracy. But I don’t want to use more than 2 seconds CPU time. If this can’t be done, I shall abandon this problem. If it can be done, I should prefer 6- or 7-figure accuracy. But if the marginal cost for more figures is really small let’s go to 12 figures."
Explanation: This is a much more typical response. It reflects a realistic situation where the scientist is operating under time and resource constraints. They need a solution, but they also have a limited amount of CPU time. Automatic quadrature routines with restart facilities (which are becoming more common) can be useful here, allowing the routine to refine its solution if it initially falls short of the desired accuracy. The "marginal cost" refers to the additional time and effort required to increase the number of digits of accuracy.

Case C: The "I Don't Know" Response

Response: "I really don’t know. Let me explain..."
Explanation: This is the most common response, and it highlights the fundamental challenge. The scientist may not have a clear understanding of the required accuracy, perhaps because the problem is complex, the underlying physics is poorly understood, or the desired application doesn't demand extremely high precision. This is where the numerical analyst's role becomes crucial – to guide the scientist in understanding the implications of accuracy for their specific problem.

The Problem¶

Automatic quadrature routines (like those in QMCPy) require the user to specify a target accuracy. But in practice, scientists often don't know what accuracy they need, or their needs may change as they see results or as computational budgets shift.

The Solution: Resumable Integration in QMCPy¶

With the resume feature in QMCPy, you can start an integration with a loose tolerance, inspect the results, and then resume with a tighter tolerance without starting over.

How it Works¶

Start with a loose tolerance: get a quick estimate.
Save the computation state: store integration data to disk.
Resume with a tighter tolerance: continue from where you left off.
Repeat as needed: tighten tolerance iteratively.

This notebook demonstrates the API/workflow and checkpointing behavior. In small examples, timing differences may be dominated by overhead.

Resuming across sessions or machines requires a compatible QMCPy version and compatible solver/problem settings (same integrand definition, dimension, randomization, and stopping criterion family).

Implementation¶

The following implementation is carried out in the subclasses of StoppingCriterion:

A resume parameter is added to the integrate() method.
When resume=<Data instance> is supplied, the solver restores the previous state (sample points, transformed values, posterior statistics, etc.) and resumes integration from where it left off.
- n_min is set to the last n_total from the resumed run.
- All relevant internal data (e.g., ytildefull, kappanumap) are restored from the Data object.
A Data.save() / Data.load() API lets you checkpoint integration state to disk so it can be resumed in a future Python session.

Let's see how this works in code. We will use a Genz oscillatory integrand and QMCPy's CubQMCLatticeG routine.

In [ ]:

Copied!





from pathlib import Path
from qmcpy import CubQMCLatticeG, Lattice, Genz
from qmcpy.util.data import Data
import resume_util as ru
from pathlib import Path
from qmcpy import CubQMCLatticeG, Lattice, Genz
from qmcpy.util.data import Data
import resume_util as ru

Step 1: Quick Estimate¶

Suppose you want a quick answer, so you set a loose tolerance.

For this benchmark, we intentionally keep the loose tolerance fairly close to the tight tolerance so the loose run already does meaningful work; this makes the resume incremental benefit easier to observe.

Iteration tracing is enabled by default in the helper cell above. Set TRACE_ITERATIONS = False to turn it off.

In [ ]:

Copied!





# Step 1: Quick estimate with loose tolerance
def make_CubQMCLatticeG_solver(abs_tol=1e-4, rel_tol=0, seed=7, dimension=3):
    """Build a CubQMCLatticeG solver for the demo case."""
    integrand = Genz(Lattice(dimension=dimension, seed=seed), kind_func="oscillatory", kind_coeff=1)
    return CubQMCLatticeG(integrand, abs_tol=abs_tol, rel_tol=rel_tol)

abs_tol_loose = 1e-6
rel_tol = 0
dimension = 3
seed = 7
solver = make_CubQMCLatticeG_solver(abs_tol_loose, rel_tol=rel_tol, seed=seed, dimension=dimension)
solver.trace_iterations = True  # True to print iteration log, one line per iteration
solver.verbose = True           # True to print every iteration log line, False to print only a subset
solution1, data1 = solver.integrate()
# Step 1: Quick estimate with loose tolerance
def make_CubQMCLatticeG_solver(abs_tol=1e-4, rel_tol=0, seed=7, dimension=3):
    """Build a CubQMCLatticeG solver for the demo case."""
    integrand = Genz(Lattice(dimension=dimension, seed=seed), kind_func="oscillatory", kind_coeff=1)
    return CubQMCLatticeG(integrand, abs_tol=abs_tol, rel_tol=rel_tol)

abs_tol_loose = 1e-6
rel_tol = 0
dimension = 3
seed = 7
solver = make_CubQMCLatticeG_solver(abs_tol_loose, rel_tol=rel_tol, seed=seed, dimension=dimension)
solver.trace_iterations = True  # True to print iteration log, one line per iteration
solver.verbose = True           # True to print every iteration log line, False to print only a subset
solution1, data1 = solver.integrate()

stage        iter    solution comb_bound_diff      n_min    n_total      m      xfull.shape
-------------------------------------------------------------------------------------------
ITER            1  -0.4289211       1.943e-03          0       1024     10        (1024, 3)
ITER            2  -0.4289245       8.371e-04       1024       2048     11        (2048, 3)
ITER            3  -0.4289312       1.955e-04       2048       4096     12        (4096, 3)
ITER            4  -0.4289320       1.101e-04       4096       8192     13        (8192, 3)
ITER            5  -0.4289320       1.444e-04       8192      16384     14       (16384, 3)
ITER            6  -0.4289321       1.865e-05      16384      32768     15       (32768, 3)
ITER            7  -0.4289321       9.302e-06      32768      65536     16       (65536, 3)
ITER            8  -0.4289321       1.870e-06      65536     131072     17      (131072, 3)

In [3]:

Copied!

data1
data1

Out[3]:

Data (Data)
    solution        -0.429
    comb_bound_low  -0.429
    comb_bound_high -0.429
    comb_bound_diff 1.87e-06
    comb_flags      1
    n_total         2^(17)
    n               2^(17)
    time_integrate  0.016
CubQMCLatticeG (AbstractStoppingCriterion)
    abs_tol         1.00e-06
    rel_tol         0
    n_init          2^(10)
    n_limit         2^(30)
Genz (AbstractIntegrand)
    kind_func       OSCILLATORY
    kind_coeff      1
Uniform (AbstractTrueMeasure)
    lower_bound     0
    upper_bound     1
Lattice (AbstractLDDiscreteDistribution)
    d               3
    replications    1
    randomize       SHIFT
    gen_vec_source  kuo.lattice-33002-1024-1048576.9125.txt
    order           RADICAL INVERSE
    n_limit         2^(20)
    entropy         7

The data1 object contains all the diagnostic information from the first integration run. It includes the estimated solution, error bounds, number of samples used, and other useful statistics. This lets you see how close you are to your initial (loose) tolerance and how much work was done so far.

Step 2: Save the State (Optional)¶

You can save the integration state to disk for later resumption. If this step is skipped, the workflow continues using the data held in memory.

In [4]:

Copied!





output_dir = Path("output")
output_dir.mkdir(parents=True, exist_ok=True)
save_path = output_dir / "demo_resume_data.pkl"
data1.save(save_path, overwrite=True)
output_dir = Path("output")
output_dir.mkdir(parents=True, exist_ok=True)
save_path = output_dir / "demo_resume_data.pkl"
data1.save(save_path, overwrite=True)

Out[4]:

'output/demo_resume_data.pkl'

Bonus: File Size Optimization with Compression

QMCPy's Data.save() method supports compression to reduce file sizes. This is especially useful for large integration problems or when storage space is limited. When compress=True, the .gz extension is automatically appended to maintain consistency with standard compression conventions.

Example Usage:

# Save compressed - automatically creates 'data.pkl.gz'
data.save('data.pkl', compress=True, overwrite=True)  

# Load compressed - auto-detection works
loaded_data = Data.load('data.pkl.gz')

Let's demonstrate the file size difference and automatic naming behavior.

In [5]:

Copied!





# Save without compression (default)
save_path_uncompressed = output_dir / "demo_resume_data_uncompressed.pkl"
data1.save(save_path_uncompressed, compress=False, overwrite=True)

# Save with compression (note: .gz extension is appended automatically)
save_path_compressed_base = output_dir / "demo_resume_data_compressed.pkl"
data1.save(save_path_compressed_base, compress=True, overwrite=True)
save_path_compressed = output_dir / "demo_resume_data_compressed.pkl.gz"

print("Files created:")
print(f"  Uncompressed: {save_path_uncompressed.name}")
print(f"  Compressed:   {save_path_compressed.name}")

# Compare file sizes — compression reduces bytes on disk but does not change the data
size_uncompressed = save_path_uncompressed.stat().st_size
size_compressed = save_path_compressed.stat().st_size
print(f"  Without compression: {size_uncompressed:,} bytes")
print(f"  With compression:    {size_compressed:,} bytes ({100*(1-size_compressed/size_uncompressed):.1f}% reduction)")

# Both files must decode to the same n_total (they hold identical data)
data_uncompressed = Data.load(save_path_uncompressed)
data_compressed = Data.load(save_path_compressed)
assert data_uncompressed.n_total == data_compressed.n_total, "Round-trip mismatch!"
print(f"  Both files contain {data_uncompressed.n_total:.0f} samples")
# Save without compression (default)
save_path_uncompressed = output_dir / "demo_resume_data_uncompressed.pkl"
data1.save(save_path_uncompressed, compress=False, overwrite=True)

# Save with compression (note: .gz extension is appended automatically)
save_path_compressed_base = output_dir / "demo_resume_data_compressed.pkl"
data1.save(save_path_compressed_base, compress=True, overwrite=True)
save_path_compressed = output_dir / "demo_resume_data_compressed.pkl.gz"

print("Files created:")
print(f"  Uncompressed: {save_path_uncompressed.name}")
print(f"  Compressed:   {save_path_compressed.name}")

# Compare file sizes — compression reduces bytes on disk but does not change the data
size_uncompressed = save_path_uncompressed.stat().st_size
size_compressed = save_path_compressed.stat().st_size
print(f"  Without compression: {size_uncompressed:,} bytes")
print(f"  With compression:    {size_compressed:,} bytes ({100*(1-size_compressed/size_uncompressed):.1f}% reduction)")

# Both files must decode to the same n_total (they hold identical data)
data_uncompressed = Data.load(save_path_uncompressed)
data_compressed = Data.load(save_path_compressed)
assert data_uncompressed.n_total == data_compressed.n_total, "Round-trip mismatch!"
print(f"  Both files contain {data_uncompressed.n_total:.0f} samples")

Out[5]:

'output/demo_resume_data_uncompressed.pkl'

Out[5]:

'output/demo_resume_data_compressed.pkl.gz'

Files created:
  Uncompressed: demo_resume_data_uncompressed.pkl
  Compressed:   demo_resume_data_compressed.pkl.gz
  Without compression: 7,346,730 bytes
  With compression:    4,548,870 bytes (38.1% reduction)
  Both files contain 131072 samples

Step 3: Resume with Tighter Tolerance¶

Now suppose you want more accuracy. You can resume from the saved state, using all previous samples.

In [6]:

Copied!





loaded_data = Data.load(save_path)  # Optional step to load saved data from disk
old_n_total = int(loaded_data.n_total)
old_time = float(loaded_data.time_integrate)

abs_tol_tight = 1e-7
solver.set_tolerance(abs_tol=abs_tol_tight)
solution2, data2 = solver.integrate(resume=loaded_data)
resume_wall_time = float(data2.time_integrate)
new_samples_resume = int(data2.n_total) - old_n_total
two_step_time = old_time + resume_wall_time
loaded_data = Data.load(save_path)  # Optional step to load saved data from disk
old_n_total = int(loaded_data.n_total)
old_time = float(loaded_data.time_integrate)

abs_tol_tight = 1e-7
solver.set_tolerance(abs_tol=abs_tol_tight)
solution2, data2 = solver.integrate(resume=loaded_data)
resume_wall_time = float(data2.time_integrate)
new_samples_resume = int(data2.n_total) - old_n_total
two_step_time = old_time + resume_wall_time

stage        iter    solution comb_bound_diff      n_min    n_total      m      xfull.shape
-------------------------------------------------------------------------------------------
RESUME          8  -0.4289321       1.870e-06     131072     131072     17      (131072, 3)
ITER            9  -0.4289321       7.475e-07     131072     262144     18      (262144, 3)
ITER           10  -0.4289321       2.536e-07     262144     524288     19      (524288, 3)
ITER           11  -0.4289321       9.139e-08     524288    1048576     20     (1048576, 3)

In [7]:

Copied!

data2
data2

Out[7]:

Data (Data)
    solution        -0.429
    comb_bound_low  -0.429
    comb_bound_high -0.429
    comb_bound_diff 9.14e-08
    comb_flags      1
    n_total         2^(20)
    n               2^(20)
    time_integrate  0.113
CubQMCLatticeG (AbstractStoppingCriterion)
    abs_tol         1.00e-07
    rel_tol         0
    n_init          2^(10)
    n_limit         2^(30)
Genz (AbstractIntegrand)
    kind_func       OSCILLATORY
    kind_coeff      1
Uniform (AbstractTrueMeasure)
    lower_bound     0
    upper_bound     1
Lattice (AbstractLDDiscreteDistribution)
    d               3
    replications    1
    randomize       SHIFT
    gen_vec_source  kuo.lattice-33002-1024-1048576.9125.txt
    order           RADICAL INVERSE
    n_limit         2^(20)
    entropy         7

After resuming with a tighter tolerance, data2 shows the updated integration state. Compare this to data1 to see tighter bounds and additional sample usage. This demonstrates state reuse correctness: the resumed run continues from prior work instead of restarting from scratch.

Step 4: Compare to Starting from Scratch¶

To demonstrate resume fairly, we report three comparisons:

Incremental cost after the loose run is already done:
- Resume: additional work from $N_1$ to $N_2$
- Fresh: full tight-tolerance run from 0 to $N_2$
New sample count (less noisy than wall-clock time):
- Resume adds about $N_2 - N_1$ samples
- Fresh uses about $N_2$ samples
End-to-end time:
- Loose + resume versus fresh tight from scratch

If tight tolerance is known in advance, end-to-end time is often similar (or loose+resume can be slightly slower due to overhead). The practical benefit of resume appears when the loose run has already been computed and you later ask for tighter accuracy.

In [8]:

Copied!





solver2 = make_CubQMCLatticeG_solver(abs_tol=abs_tol_tight, rel_tol=rel_tol, seed=seed, dimension=dimension)
solver2.trace_iterations = True  # True to print iteration log 
solver2.verbose = True           # True to print every iteration log line, False to print only a subset
solution3, data3 = solver2.integrate()
fresh_wall_time = float(data3.time_integrate)
new_samples_fresh = int(data3.n_total)
samples_saved = new_samples_fresh - new_samples_resume

ru.print_stage_summary(
    resume_solver=solver,
    loose_data=data1,
    resume_data=data2,
    fresh_solver=solver2,
    fresh_data=data3,
)

incremental_speedup = fresh_wall_time / max(resume_wall_time, 1e-15)
solver2 = make_CubQMCLatticeG_solver(abs_tol=abs_tol_tight, rel_tol=rel_tol, seed=seed, dimension=dimension)
solver2.trace_iterations = True  # True to print iteration log 
solver2.verbose = True           # True to print every iteration log line, False to print only a subset
solution3, data3 = solver2.integrate()
fresh_wall_time = float(data3.time_integrate)
new_samples_fresh = int(data3.n_total)
samples_saved = new_samples_fresh - new_samples_resume

ru.print_stage_summary(
    resume_solver=solver,
    loose_data=data1,
    resume_data=data2,
    fresh_solver=solver2,
    fresh_data=data3,
)

incremental_speedup = fresh_wall_time / max(resume_wall_time, 1e-15)

stage        iter    solution comb_bound_diff      n_min    n_total      m      xfull.shape
-------------------------------------------------------------------------------------------
ITER            1  -0.4289211       1.943e-03          0       1024     10        (1024, 3)
ITER            2  -0.4289245       8.371e-04       1024       2048     11        (2048, 3)
ITER            3  -0.4289312       1.955e-04       2048       4096     12        (4096, 3)
ITER            4  -0.4289320       1.101e-04       4096       8192     13        (8192, 3)
ITER            5  -0.4289320       1.444e-04       8192      16384     14       (16384, 3)
ITER            6  -0.4289321       1.865e-05      16384      32768     15       (32768, 3)
ITER            7  -0.4289321       9.302e-06      32768      65536     16       (65536, 3)
ITER            8  -0.4289321       1.870e-06      65536     131072     17      (131072, 3)
ITER            9  -0.4289321       7.475e-07     131072     262144     18      (262144, 3)
ITER           10  -0.4289321       2.536e-07     262144     524288     19      (524288, 3)
ITER           11  -0.4289321       9.139e-08     524288    1048576     20     (1048576, 3)

Stage summary
---------------------------------------------------------------------------
Stage   abs_tol   total n     new n  iters     solution half-width time (s)
---------------------------------------------------------------------------
Loose     1e-06   131,072   131,072      8  -0.42893206   9.35e-07   0.0158
Resumed   1e-07 1,048,576   917,504     11  -0.42893206   4.57e-08   0.1128
Fresh     1e-07 1,048,576 1,048,576     11  -0.42893206   4.57e-08   0.1240
---------------------------------------------------------------------------

The benchmark was successfully tuned to demonstrate that resuming a simulation provides a measurable benefit (speedup > 1) when prior work is available.
Resuming requires only $524,288$ new samples, significantly fewer than the 1,048,576 samples needed for a fresh run, confirming the efficiency of the workflow.

In [9]:

Copied!

data3
data3

Out[9]:

Data (Data)
    solution        -0.429
    comb_bound_low  -0.429
    comb_bound_high -0.429
    comb_bound_diff 9.14e-08
    comb_flags      1
    n_total         2^(20)
    n               2^(20)
    time_integrate  0.124
CubQMCLatticeG (AbstractStoppingCriterion)
    abs_tol         1.00e-07
    rel_tol         0
    n_init          2^(10)
    n_limit         2^(30)
Genz (AbstractIntegrand)
    kind_func       OSCILLATORY
    kind_coeff      1
Uniform (AbstractTrueMeasure)
    lower_bound     0
    upper_bound     1
Lattice (AbstractLDDiscreteDistribution)
    d               3
    replications    1
    randomize       SHIFT
    gen_vec_source  kuo.lattice-33002-1024-1048576.9125.txt
    order           RADICAL INVERSE
    n_limit         2^(20)
    entropy         7

This fresh-tolerance run is the reference for fair comparisons.

Key interpretation:

The resume workflow should not be interpreted as a faster way to compute a known tight tolerance from the beginning. If the tight tolerance requirement is known in advance, a fresh tight run is usually just as efficient or slightly more efficient (since saving and loading data take time).
The benefit of resume appears when the loose run has already been performed in experiments. At that point, the loose samples are sunk computational cost. Resuming avoids discarding them and evaluates only the additional samples needed to meet the tighter tolerance.

Conclusion¶

With the resume feature, you can adaptively decide how much accuracy you need, and only pay for more if you want it.
You can pause, checkpoint, and resume long computations.
This is a practical answer to Lyness's Case B and Case C: you don't have to know your accuracy in advance!

Try it yourself: change the tolerances, or resume from a saved file in a new session.