Difference between revisions of "Levels of Evidence"

From HemOnc.org - A Hematology Oncology Wiki
Jump to navigation Jump to search
Line 474: Line 474:
  
 
=Example code (for contributors)=
 
=Example code (for contributors)=
 +
==Current style==
 +
{|
 +
|style="background-color:#ff0000"|Phase I
 +
|-
 +
|} Red label code: <nowiki>style="background-color:#ff0000"|Phase I</nowiki>
 +
 +
{|
 +
|style="background-color:#EEEE00"|Phase II
 +
|-
 +
|} Yellow label code: <nowiki>style="background-color:#EEEE00"|Phase II</nowiki>
 +
 +
{|
 +
|style="background-color:#00CD00"|Phase III
 +
|-
 +
|} Green label code: <nowiki>style="background-color:#00CD00"|Phase III</nowiki>
 +
 +
==Older style==
 
<span  
 
<span  
 
style="background:#ff0000;
 
style="background:#ff0000;

Revision as of 22:36, 23 October 2016

The purpose of this page is to create a reference to describe our methodology for assigning levels of evidence to regimens.

Important note: Our intent is not to provide clinical decision support. Rather, our goal is to faithfully reproduce findings of clinical trials. Efficacy and toxicity information, in particular, is sometimes presented by authors in a confusing or ambivalent manner. As such, we try to illustrate ambiguities when they happen, and take no responsibility for your decision to choose a particular treatment regimen. Please read our disclaimer for further information.

A bit of background: We have taken a simplified approach to information visualization, based on a three-color "traffic signal" metric. In order to account for color-blindness, text is also included within each colored box. The colors we use are:

Green box text Yellow box text Red box text

See the sections below for a discussion of the various metrics we use.


Evidence

back to top

Generally, a regimen should be evaluated in a randomized fashion with an adequate patient sample to be considered a "green" regimen. We have defined adequate as 20 or more patients per arm. Non-randomized studies and randomized studies with fewer than 20 patients per arm are considered to be "yellow" regimens. Finally, case reports, retrospective series, and non-randomized studies with fewer than 20 patients enrolled are considered to be "red" regimens. Of course, there are finer gradations of the quality of evidence so this simplified scheme should be taken with a grain of salt.

Evidence is thus reported using one of the three labels:

Strong evidence

Moderate evidence

Weak evidence

Examples

back to top

A trial with strong evidence: R-CHOP for untreated follicular lymphoma

Study Evidence
Flinn et al. 2014 (BRIGHT) Phase III


A trial with moderate evidence: bortezomib & rituximab for untreated follicular lymphoma

Study Evidence
Evens et al. 2014 Phase II


A trial with weak evidence: cladribine for aggressive systemic mastocytosis

Study Evidence
Lim et al. 2009 Retrospective


Frequently asked questions

back to top

Q: What is the current status of evidence labeling on hemonc.org?
A: Most chemotherapy regimens and their variants now have a level of evidence label; we are currently working on explicitly linking comparator arms for RCTs.

Q: If a randomized trial has more than two arms, will they all be labeled the same?
A: No, it depends on how many patients are in each arm of the trial. For arms that have more than 20 patients, the label is green. For arms with fewer than 20, the label is yellow.

Q: Are non-randomized trials all labeled the same?
A: No, it depends on how many patients are in the trial. For trials that have more than 20 patients, the label is yellow. For trials with fewer than 20, the label is red.

Q: Some retrospective analyses are very large, will they be labeled yellow?
A: No, currently we label all retrospective analyses as red (weak evidence), no matter how large. Although we are major proponents of secondary use of data, including automated methods of EHR data extraction, there is currently too high of a level of unknown biases and confounding to label these other than as weak evidence. Likewise, a trial that reports on a comparison to historic or contemporary controls not enrolled in that trial will be considered non-randomized.

Efficacy

back to top

Defined generally, efficacy is the presence of a positive effect on the study population. Conversely, lack of efficacy is the absence of an expected positive effect, or the failure to achieve expected outcomes in adequate numbers of patients. Efficacy can be reported ranging from a weak surrogate measure (e.g., response rate) to a direct measure of overall survival. Currently, we are focusing on comparative efficacy and overall response rates (ORRs). We only currently report comparative efficacy for randomized trials; many non-randomized trials report efficacy compared to historical controls. However, in the rapidly developing field of oncology, this approach is rife with bias and as such we do not report on comparison to historical controls.

Efficacy is thus reported using a tri-color labeling:

Superior comparative efficacy

Equivalent comparative efficacy

Inferior comparative efficacy

More details

back to top

What we are really interested in is whether efficacy findings from a clinical trial will work for our patient. As such, we have historically relied on the cutoff of p=0.05 to accept whether or not a finding is significant and true. Of course, this means that approximately 1 in 20 reportedly "true" findings are in fact falsely positive. This "holy grail" cutoff has led to significant publication bias which is well summarized by John Ioannidis in his paper "Why Most Published Research Findings Are False." One potential solution is to report comparative efficacy "in plain English" as shown in the graphic below (link to original article).

Efficacy.jpg

Here is another way of considering P-values, only just a bit tongue-in-cheek from XKCD.

P values xkcd.png

Examples

back to top

1. A treatment regimen with superior efficacy: BR for untreated follicular lymphoma

Study Evidence Comparator Efficacy
Rummel et al. 2013 (StiL NHL1) Phase III R-CHOP Superior PFS


2. A treatment regimen which failed to demonstrate a difference in overall survival: MCP for untreated follicular lymphoma

Study Evidence Comparator Efficacy
Nickenig et al. 2006 Phase III CHOP Seems not superior


3. A treatment regimen that is non-inferior to its comparator: Bendamustine for relapsed/refractory CLL

Study Evidence Comparator Efficacy
Niederle et al. 2013 Phase III Fludarabine Seems non-inferior

4. A treatment regimen with inferior efficacy: MCP for untreated follicular lymphoma

Note that this is the same regimen as #2, above; it is the comparative efficacy that is different.

Study Evidence Comparator Efficacy
Herold et al. 2007 Phase III R-MCP Inferior OS


Frequently asked questions

back to top

Q: What is the current status of efficacy labeling on hemonc.org?
A: Some pages are more complete than others, for example the follicular lymphoma page is now completely labeled for efficacy.

Q: How do we choose to label efficacy when multiple outcomes are reported?
A: Often, a trial will report on multiple outcomes, such as overall response rate, progression-free survival, and overall survival. In this case, we generally look to the PRIMARY endpoint, as defined in the published methods. However, if a secondary endpoint shows differential efficacy and is less "surrogate" than the primary endpoint (see below), we will label by that endpoint.

Q: How do you distinguish between a failed superiority trial and a successful non-inferiority or equivalence study?
A: Both of them would be labeled yellow, but the language used is slightly different, as per the examples above. In a failed superiority trial, the arms would all be labeled "seems not superior." Whereas in a successful non-inferiority trial, the arms will be labeled as "seems non-inferior." Here is an example of ABVD followed by radiation therapy, where both types of trials have been used, in early-stage unfavorable Hodgkin lymphoma:

Study Evidence Comparator Efficacy
Raemaekers et al. 2014 (EORTC/LYSA/FIL H10) Phase III ABVD x 6 Inconclusive whether noninferior
Advani et al. 2015 (E2496) Phase III Stanford V -> RT Seems not superior

Q: Do you have a hierarchy of surrogacy?
A: Yes, this is the hierarchy that we use to determine the strength of an outcome measure:

Strong outcomes

  • Overall survival (OS)
  • All-cause mortality
  • Disease-specific mortality

Intermediate outcomes

  • Disease-free interval (DFI)
  • Disease-free survival (DFS)
  • Durable response rate (DRR)
  • Duration of response (DOR)
  • Event-free survival (EFS) Events sometimes defined differently, but usually include relapse, progression, and death from any cause.
  • Freedom from first progression (FFFP)
  • Failure-free survival (FFS) Defined as the absence of an additional systemic therapy, relapse, or non-relapse mortality.
  • Freedom from treatment failure (FFTF)
  • Progression-free survival (PFS) The most commonly used surrogate time-based measure.
    • Progression-free survival rate at 6 months (PFS6)
  • Relapse-free interval (RFI) Not commonly used outside of the adjuvant setting.
  • Relapse-free survival (RFS) Not commonly used outside of the adjuvant setting.
  • Time to next treatment (TTNT)
  • Time to treatment failure (TTTF)

Weak outcomes

  • Response rate (RR) Definitions of the below may vary across cancer subtypes:
    • Complete response rate (CR rate)
    • Minimal response rate (MR rate)
    • Near complete response rate (nCR rate)
    • Partial response rate (PR rate)
    • Stable disease rate (SD rate)
    • Stringent complete response rate (sCR rate)
    • Unconfirmed complete response rate (CRu rate)
    • Very good partial response rate (VGPR rate)
  • Overall response rate (ORR) Definition may vary across cancer subtypes but usually this is a sum of the CR + PR rates.
  • Disease control rate (DCR) Usually this is the sum of the CR + PR + SD rates.
  • Clinical feasibility rate Defined as no grade 4 neutropenia/thrombocytopenia or thrombocytopenia with bleeding, no grade 3/4 febrile neutropenia or non-hematological toxicity; no premature withdrawal/death.

Q: What about exceptional responders?
A: It is increasingly recognized, especially with newer therapies such as immunotherapy, that some patients may experience a remarkable response to a drug that otherwise appears to lack efficacy in the population. These patients are usually referred to as "exceptional responders" and may provide significant insights into rational treatment selection a.k.a., precision medicine. At this time we do not make a particular effort to identify exceptional responders, nor do we consider a regimen for inclusion in HemOnc.org if the manuscript states that it generally lacks efficacy.

Q: Do you consider quality of life (QoL) measures in efficacy?
A: Very few RCTs report on QoL measures, and as such we do not currently include them in the consideration. This may change in the future.

Toxicity

back to top

Defined generally, toxicity is the presence or absence of a negative effect (harm) on the study population. This is often also referred to as safety. As with efficacy, we only report comparative toxicity.

Toxicity is thus reported using one of the three labels:

Decreased toxicity

Equivalent toxicity

Increased toxicity

Examples

back to top

A treatment regimen with increased toxicity: R-CHOP for untreated follicular lymphoma

Study Evidence Comparator Efficacy Toxicity
Hiddemann et al. 2005 Phase III CHOP Increased OS Increased toxicity


Frequently asked questions

back to top

Q: What is the current status of toxicity labeling on hemonc.org?
A: A few regimens are currently labeled for toxicity; we are focusing current efforts on labeling comparators and efficacy.

Q: Are you basing the label on the reported CTCAE measures?
A: CTCAE measures are extremely valuable in that they are structured and thus reproducible. However, it is often hard to compare them directly. For example, if one regimen has grade 4 lab-based toxicity and the other has grade 2 gastrointestinal toxicity, which is the more toxic? In general, we plan to use the authors' interpretation of overall toxicity and tolerability when labeling - or better yet, prospectively-gathered quality-of-life data (see below).

Q: Do you plan to incorporate patient-reported outcomes?
A: As shown in numerous publications, patient reports of toxicity are more accurate than clinician assessments. However, they have not until recently been standardized. Now that the PRO-CTCAE is available, we expect to see more of these in the future and will incorporate them into the toxicity assessment.

Example code (for contributors)

Current style

Phase I

Red label code: style="background-color:#ff0000"|Phase I

Phase II

Yellow label code: style="background-color:#EEEE00"|Phase II

Phase III

Green label code: style="background-color:#00CD00"|Phase III

Older style

Case report Red label code: <span style="background:#ff0000; padding:3px 6px 3px 6px; border-color:black; border-width:2px; border-style:solid;">Case report</span>

Phase II Yellow label code: <span style="background:#EEEE00; padding:3px 6px 3px 6px; border-color:black; border-width:2px; border-style:solid;">Phase II</span>

Phase III Green label code: <span style="background:#00CD00; padding:3px 6px 3px 6px; border-color:black; border-width:2px; border-style:solid;">Phase III</span>