EWG's Guide to
Healthy Cleaning

Find products and ingredients Learn how the Guide works Discover EWG VERIFIED®

Methodology


Scoring Substances & Products


Product and Ingredient Data Sources


Toxicity Datasets & Product Ratings


Findings


Research Index



INFO / METHODOLOGY / Scoring Substances & Products

Scoring Substances & Products

Scoring Definitions and Structure

Finding – A conclusion associated with a peer-reviewed study, agency-determined categorization or data point contained in a data set.

Example 1 – Dataset: EPA cancer determinations for chemicals. Finding: EPA determination for benzene = A1; known human carcinogen.
Example 2 – Peer-reviewed study. Finding: Doe et al. found that boric acid is an endocrine-disrupting compound.

Endpoint – An effect or specific parameter associated with a finding. Endpoints include asthma, reproductive toxicity and chronic aquatic toxicity. EWG assigns an endpoint to each finding. EWG calculates separate hazard scores for each endpoint.

Endpoint group – EWG assigns each endpoint to one or more endpoint groups organized by biological system (health) or an ecological aspect (environment). An example of an endpoint group would be “cancer,” which is a collection of the endpoints for specific types of cancer (e.g. colon cancer, breast cancer, lung cancer). Likewise, the endpoint group “dermal toxicity/disease” would include endpoints such as “skin irritation,” “skin corrosion,” “dermatitis,” etc. After EWG calculates separate scores for each endpoint we then calculate a single, consolidated score for the endpoint group.

Scoring subcategories – EWG assigns each endpoint group to one scoring subcategory. An example is the “hazard” subcategory, which includes the endpoint groups of various health hazards (e.g. cancer, endocrine disruption, respiratory toxicity). The subcategory “risk-based values” includes numeric values for acute toxicity parameters, such as LD50 (the lethal dose effective for 50 percent of the test population).

Scoring category – EWG assigns each endpoint group, separated by scoring subcategory, to a broad scoring category, either health or environment. EWG calculates an overall substance score as a function of each relevant scoring category. EWG assigns weights to scoring categories to set their relative contribution to the overall score of a substance or product.

Example of a tiered structure for scoring categories, subcategories, endpoint groups, and endpoints:

  • Health (scoring category)
    • Hazard (scoring subcategory)
      • Developmental, endocrine and reproductive toxicity (endpoint group)
        • Damage to fertility (endpoint)

Scoring a Substance

Step 1: Information collection
Scientific datasets, peer-reviewed literature and other information sources are collected and reviewed to identify specific findings relevant to health and the environment that are applicable to substances in the database. Findings include determinations that a substance causes or may cause specific diseases such as breast cancer or asthma, which are considered endpoints. A finding may also indicate that a substance does not pose a risk to human health or the environment.

Step 2: Assessing the data and assigning numeric values
Each finding is assigned to an endpoint that dictates what question is being evaluated, i.e. does this substance cause asthma? EWG then evaluates what each finding says about the safety or hazardousness of a substance and assigns it appropriate scoring values. The values depend on:

  1. Strength of association – A numeric value assigned to each finding that reflects how strongly that finding is associated with a particular endpoint, such as asthma. The strength of association values and their corresponding definitions are listed below.
    Strength of association Definition
    -10 No association, confirmed
    -5 Limited evidence of no association
    0 Unknown (no data or conflicting data)
    5 Limited evidence of an association
    8 Likely/probable association
    10 Confirmed association

    Example: The US Environmental Protection Agency finds that benzene is a known human carcinogen. EWG assigns this finding a strength of association of 10 = Confirmed association.

  2. Scope of assessment – A categorization determined by the credibility and comprehensiveness of the data. A finding or determination from a comprehensive agency assessment, such as a National Toxicology Program (NTP) report on carcinogenicity, will be given a higher value than a finding generated from a single peer-reviewed study. EWG assigns a “scope tier” to each finding as defined in the table below.
    Scope of dataset or study Nature of data considered or generated in the dataset or study Who evaluated or generated the data (authors)? Form of documentation Examples
    Scope tier 6 Absolute, comprehensive weight-of-the-evidence assessment based on data sufficient for complete understanding and certainty or scientific fact. Finding from a completely authoritative scientific body or known as fact. Full, publicly available documentation of data collection, methods, and conclusions. Oxygen is a gas at standard temperature and pressure.
    Scope tier 5 Comprehensive, weight-of-the-evidence evaluation of available scientific information. Data is relevant to a broad issue. Authoritative government agency or permanent, non-governmental body with accepted, comparable authoritative standing. Full, publicly available documentation of data collection, methods, and conclusions. Findings may be based on specific criteria. IARC1 Monographs on the Evaluation of Carcinogenic Risks to Humans, the UNECE2 Globally Harmonized System of Classification and Labelling of Chemicals.
    Scope tier 4 Weight-of-the-evidence evaluation of available scientific information. May be limited to evaluation of data relevant to one aspect of a broader issue. Authoritative body of experts, including standing and ad-hoc government and industry advisory panels Full, publicly available documentation of data, methods, and conclusions. Findings may be based on specific criteria. An EPA3 committee report on endocrine disrupting chemicals. A JECFA4 toxicological evaluation of a food additive.
    Scope tier 3 Weight-of-the-evidence evaluation of available scientific information on a limited subject. Qualified expert(s). Independently evaluated and published in a peer-reviewed journal. Findings may be based on specific criteria. A review article on neurotoxic compounds that evaluates a significant body of information from a variety of data sources. A CIR5 panel final report on a class of fragrance chemicals.
    Scope tier 2 Evaluation of credible scientific data on a limited or single subject. Qualified expert(s). Independently evaluated and published in a peer-reviewed journal, or authored by a credible source. A published scientific study on the reproductive toxicity of a particular substance.
    Scope tier 1 A finding generated from screening-level testing or a method less rigorous than alternate tests that more closely approximate real-world conditions. Qualified expert(s). Findings reported by a credible source, such as a government agency. Screening-level studies including in-vitro testing and reports of a single data point.
    1 International Agency for Research on Cancer
    2 United Nations Economic Commission for Europe
    3 US Environmental Protection Agency
    4 Joint FAO/WHO Expert Committee on Food Additives
    5 Cosmetic Ingredient Review

    Example: The US Environmental Protection Agency finding that benzene is a known human carcinogen was determined by a rigorous, comprehensive review of all available data on all routes of exposure, and makes their review process publically available. EWG assigns this finding to scope tier 5.

  3. Severity – EWG assigns a numeric weight called severity to each endpoint, which is intended to reflect the relative impact of each endpoint on health, the environment and other factors considered in EWG’s ratings. Severity values are scaled in high, moderate and low categories according to their relative effects. Low values typically indicate safer, more desirable attributes.

    • High – Includes endpoints associated with:
      • Systemic diseases that can be significantly debilitating or fatal, including cancer and asthma.
      • Conditions that can cause multi-generational impacts, including endocrine disruption, developmental effects, and reproductive problems.
      • Permanent damage to the skin, such as that caused by caustic substances.
      • Classification of high acute or chronic toxicity based on a numeric value, such as a very low No Observable Effect Level (NOEL).
    • Moderate – Includes endpoints associated with:
      • Significant but reversible physical discomfort, including respiratory or skin irritation.
      • Dermal and respiratory allergies.
      • Classification of moderate acute or chronic toxicity based on a numeric value, such as a mid-range NOEL.
    • Low – Includes endpoints associated with:
      • Little to no potential for ecological toxicity, such as ready biodegradability.
      • Little to no potential for adverse effects to human health, such as an EPA Group E designation (evidence of non-carcinogenicity for humans).
      • Classification of low acute or chronic toxicity based on a numeric value, such as a high NOEL.
  4. Uncertainty (u) – A value that EWG assigns to each finding, from among the categories below, that reflects the extent of uncertainty encompassed by a particular finding. EWG uses this parameter to reflect cases in which uncertainty is greater than what is typical for data within a given tier. Uncertainty may reflect data collection gaps and method limitations.

    The assigned uncertainty values (u) are defined below:

    • Default – typical or unassessed certainty: u = 1.0
    • Limited uncertainty – exceeding typical: u = 0.5
    • No data: u = 0
    Example: Uncertainty. A review paper reports that a substance is not associated with reproductive toxicity and provides limited data to support that claim. Several independent, peer-reviewed studies that show the substance is associated with reproductive toxicity are not mentioned in the review. The finding that the substance is not a reproductive toxicant may be therefore assigned a Limited uncertainty value of 0.5.
  5. Route r(route) – An attribute that EWG assigns to each finding relevant to the route of exposure. The likelihood of exposure to substance in a household cleaning product depends on the product type and form. This attribute is only utilized when performing calculations at the product level. This factor is intended to down-weight exposures that are unanticipated with consumer use. Score modification based on product route is relevant to the following scenarios:

    Example: Exposure route. For aerosol products, the likelihood of exposure through inhalation is greater than the likelihood of exposure through ingestion. Therefore endpoints specific to inhalation will drive the hazard score for that product.

    The assigned route values r(route) are defined as:

    • Default: Relevant route of exposure: r(route) = 1.0
    • Unlikely route of exposure: r(route) = 0.5
    • No anticipated exposure: r(route) = 0
  6. Misuse m(misuse) – An attribute that EWG assigns to each finding from among the categories below, indicating that the hazard is relevant only in cases of misuse. This attribute is only utilized when doing calculations at the product level. EWG uses this parameter to down-weight hazard information that would be applicable only in unlikely or unanticipated use scenarios, including accidental ingestion or intentional inhalation. Score modification based on product misuse is relevant to the following scenario:

    Example: Type of misuse. Some health effects would only occur if the product were intentionally misused, such as coma or death resulting from inhaling a propellant. This can be accounted for by using the misuse modification.

    The assigned misuse values m(misuse) are defined as:

    • Default: Proper usage: m(misuse) = 1.0
    • Accidental misuse: m(misuse) = 0.5
    • Intentional misuse (exposure would only occur if the product were purposefully misused): m(misuse) = 0

Calculating a score for a single endpoint. EWG combines the findings from multiple datasets and makes a determination about the overall strength of association, positive or negative, between a substance and a particular health or environmental endpoint. This determination is based on:

  1. The weight of the evidence. The overall strength of association between a hazard and a substance will be driven by the amount of evidence that supports that association.
  2. Resolving conflicting data. If there are conflicting findings, for example if one agency determines that a substance is a known endocrine disruptor and another finds it does not have enough data to make a determination, EWG will take a precautionary position and assign a value that reflects some association of hazard.

Algorithms for consolidating findings for a single endpoint. The numerical score for a single endpoint for a single substance in the database must account for many findings that vary in their strength of association, scope of assessment, and uncertainty. The goal is to determine an overall strength of association for a given endpoint, considering all findings for that endpoint. The result will be an aggregate strength of association value ranging between -10 and +10. A +10 score indicates that a substance is definitively associated with an endpoint (such as a substance known to be a carcinogen), while a -10 score indicates that a substance is definitively not associated with an endpoint (such as a substance that is known to be non-carcinogenic).

Step 3: Consolidating findings at each scope of assessment tier
EWG calculates a separate score for each scope of assessment tier. Each tier score incorporates all findings that fall within that tier. The tier score represents a weighted average of the strength of association for the findings at a given tier.

Equation 1

Findings average equation

where

Findings average equation
where SA = strength of association, u = uncertainty for each finding, severity = severity score, and w is a weighting factor that resolves cases conflicting findings, with some showing an association between a substance and an endpoint and others not. In these cases, the weighting factor serves to emphasize the findings that show an association, reflecting the precautionary principle. The weighting function increases linearly with increased strength of association.

Step 4: Consolidating scope of assessment tier scores into an endpoint score
EWG consolidates tier scores into a single endpoint score using a tier weight and based on the number of studies in each tier. It ranges from -10 to +10.

Equation 2

Endpoint score
where TS is the tier score, TW is the tier weight, and Max(TW) is the maximum calculated tier weight, across all tiers, as described in equation 5 below. The sums are performed over all scope of assessment tiers (0 to 6).

Calculating the tier weight is accomplished using the following equation:

Equation 3

Tier Weight
where (see table below for values associated with these variables):
   TWmax is the maximum possible weight for a tier
   TWmin is the minimum possible weight for a tier
   ?TW = TWmax - TWmin
   n90 is the number of studies required to reach TWmin + 90%(?TW)
   nfindingsmax is the maximum number of findings on a tier that fall at the same strength of association
Scope of assessment tier TWmin TWmax n90
6 10 10 1
5 9 9.9 2
4 6 9.5 3
3 4 7.5 5
2 2 4.0 10
1 0.5 1.0 15

When multiple findings on a tier come to similar conclusions as represented by the strength of association, those findings taken together have an increased tier weight (TWmax). This reflects the idea that there is a higher level of consensus on the question of how strongly a substance is associated with an endpoint.

Step 5: Calculating the consolidated scores for endpoint groups and categories
Currently we have identified three methods to consolidate hazard scores at different stages in substance scoring (endpoint group or category scoring), which may also be applied when calculating substance group or product scores.

Method 1: Maximum score
The consolidated score is taken as the highest single score from all endpoints, groups, or subcategories.

Equation 4

Consolidated score
where S is the score for each endpoint, group, or category being consolidated.

Method 2: Weight of evidence score
The consolidated score is a combination of all severity scores with a weighting factor that emphasizes the findings that show an association, reflecting the practice of precaution when data conflict.

Equation 5

Weight of Evidence
where S is the score from each endpoint, group, or category being consolidated; Sneg are scores < 0; Smax is the maximum score; the sum of logs are limited to scores > 1; and wea is a weight of evidence adjustment given by:

Equation 6

Weight of Evidence score
where rank is an ordered list of scores ranged 0-10 ranked from highest (10) to lowest (0).

Method 3: Weighted average score
The consolidated score is the weighted average of the elements being collected. In this method, each sub-element must be assigned a weighting factor between 0 and 1.

Equation 7

Consolidated score
where S is the score from the sub-element and weight is assigned for each sub-element.

Determining the endpoint group score. Each “endpoint group score” is calculated from all the endpoints within that group by using Method 2: Weight of evidence score.

Determining the category score. Different endpoints associated with the substance are combined to determine an overall category score for either “health” or “environment.” EWG assigns a category score to a substance based on the number of health or environmental endpoints (and their associated hazard). For example, a substance that shows very weak evidence of hazard to human health would receive a fair health category score, while a substance that is a probable carcinogen, a known endocrine disruptor and causes asthma would receive a bad health category score value.

The “health” or “environment” category scores are also calculated using Method 2: Weight of evidence score, where all of the endpoints for that category are combined to yield an overall category score.

Step 6: Combining “health” and “environment” category scores to determine a substance or substance group hazard score
After the health and environment category scores are calculated, they are combined to generate an ingredient’s hazard score.

Equation 8

Combined hazard score

where (see table below for values associated with these variables):
   Hs = Health hazard score
   Es = Environment hazard score

Calculating the substance group score for chemical or functional classes containing multiple substances The scores for substance groups are calculated using Method 1: Maximum score. The maximum score reflects the hazards associated with using the most hazardous ingredient that could be described by the non-specific ingredient name.

Step 7: Calculating an ingredient hazard score for a product
The scores for each substance in the product are combined to determine an overall score indicating the hazardousness of the ingredients in a product. If a product contains several ingredients with high hazard scores, the hazard score for that product will also be high. Likewise if the ingredients in a product are generally low hazard, the score will be low. Special considerations will affect the overall hazard score. For example, some substances are more hazardous through specific routes of exposure or in certain concentrations. The types of exposures associated with the product and ingredient concentrations will be factored into the score.

The overall ingredient hazard score for a product is calculated for both health and environment categories by using Method 3: Weight of Evidence.

Equation 1 is modified to reflect a down-grading of health endpoints relating to exposures that are unanticipated with consumer use, using the r(route) attribute associated with each finding. The m(misuse) attribute, if applicable, may also be factored in using the same modified equation:

Modified Equation 1

Modified Equation
where m(misuse)*r(route) modifies the equation to account for product misuse and route of exposure

In addition, when appropriate, EWG may not score a product ingredient for acute health effects, such as skin corrosion, if the concentration of that ingredient in the product would not result in the health endpoint. This does not apply to any endpoint relevant to chronic health effects, such as endocrine disruption or cancer. Endpoints that are acute and concentration-dependent would therefore be eliminated from the score for the ingredient in question when calculating the product score.

Example: Sodium hydroxide can cause acute irreversible and severe skin and eye damage in its concentrated form. However, it is also used in diluted amounts in many products to adjust the pH. Therefore, sodium hydroxide should not receive a hazard score for skin or eye corrosion if the concentration used in the product in question would not be sufficient to result in this type of damage.

Step 8: Calculating a product disclosure score
EWG believes consumers have a right to know as much as possible about product ingredients, and that companies should label their products appropriately. Product labels are the best source of health and hazard information available to consumers when they are deciding what to buy. Failing to disclose ingredient information on the label, or using vague words to describe specific ingredients, negatively impacts a product’s score.

Disclosure scoring criteria are consistent with the following principles:

  • It is preferable to list a specific substance name rather than a descriptive chemical group (e.g., alcohol ethoxylate) or a cleaners-relevant functional group (e.g., “surfactant”).
  • It is preferable to list a substance function without the substance name than to list nothing at all.
  • It is preferable to list some information on the label rather than none, even if it is less than other sources list. For example, a product with four ingredients on the label and eight on the company website will score better than a product with no ingredients on the label and eight on the website.
  • Product must have label information and a website to be scored. If the label is unavailable for analysis we use the average label score for the brand. If no labels are available for a brand, we assume label disclosure falls between none and the level provided on the website.
  • Assumptions are made about the number of ingredients in cleaning products. If a product label shows fewer than four ingredients, there will be a deduction in disclosure scoring after manual checking to make sure this rule should apply.
  • EWG encourages manufacturers to make available worker.

Equation 9

Label Disclosure Equation
where Label Disclosure ranges from 0 to 12, Website Disclosure ranges from 0 to 6, and MSDS Disclosure ranges from 0 to 2. Final scores therefore range from -10 (complete disclosure) to +10 (no disclosure).

Label Disclosure is calculated as follows:

  • 12 points if all ingredients listed are specific substances rather than chemical or functional groups or terms like “fragrance” that can hide a multitude of unlisted chemicals
  • 10 points if more than 75 percent of ingredients listed are specific
  • 6 points if more than 50 percent of ingredients listed are specific
  • 4 points if more than 25 percent of ingredients listed are specific
  • 2 points if any ingredients are listed
  • score halved if fewer than 4 ingredients

Website Disclosure is calculated as follows:

  • 6 points if all ingredients listed are specific substances rather than chemical or functional groups or terms like “fragrance” that can hide a multitude of unlisted chemicals
  • 5 points if more than 75 percent of ingredients listed are specific
  • 3 if more than 50 percent of ingredients listed are specific
  • 2 if more than 25 percent of ingredients listed are specific
  • 1 if any ingredients are listed
  • score halved if fewer than 4 ingredients are listed
  • if no website information is available, half the label score is used.
  • if a product label has a machine-scannable barcode or QR code that allows consumers to access further ingredient information at point of sale, the website disclosure score is multiplied by 1.5 and it is calculated from a possible nine points.

As of March 01, 2021, EWG automatically assigns 2 points for MSDS Disclosure.

MSDS Disclosure was previously calculated as follows:

  • 2 points for an available MSDS
  • 0 when no MSDS is available
  • 1 for an MSDS that lists ingredients not found on label or website
The ingredient disclosure score is then combined with the ingredient hazard score to calculate an initial product score.

Step 9: Combining the ingredient hazard and disclosure scores and calculating a final product score
The overall ingredient hazard score and the disclosure score are combined. The scores are effectively combined using a weighted average where scores indicating hazards or poor disclosure are the driving factors. With good disclosure and a lack of hazard information, an average of the scores is taken.

Equation 10

If the combined ingredient score is < 2:

Initial Product score
where (see table below for values associated with these variables):
   Cs = Combined health and environment hazard score
   Ds = Disclosure score

If the combined ingredient score is >2:

Initial Product score

If the combined ingredient score is greater than 2 than disclosure cannot improve a product’s score, since the ingredients in the product show sufficient hazard to warrant concern and an initial product score of “D”.

Finally, the product score is adjusted based on several criteria to reflect additional hazard concerns or qualities that make the product healthier and more eco-friendly. These criteria may include:

  1. pH. A product with a pH at or below 2.0 or at or above 11.5 gets a less favorable score due to risk of permanent damage to skin or eyes from corrosive properties.
  2. Volatile Organic Compounds (VOC) content. VOCs react with nitrogen oxides to form ozone, which has serious health and environmental effects. Products containing more than 50 or 75 percent VOCs receive progressively worse scores since they contribute to the formation of ozone and poor air quality.
  3. Regulatory violations. If a product would violate a regulatory standard, such as one set by the state of California or the European Union, its score will be docked to reflect this.

Points are added to or subtracted from the final product score according to specific criteria and constrained between -10 and +10. Scores are translated to letter grades using the following scale:

A = -10 to -6
B = greater than -6 to -2
C = greater than -2 to 2
D = greater than 2 to 6
F = greater than 6 to 10








Cleaners Healthy Living App Cleaners Healthy Living App

EWG research
on the go

Find personal care, cleaning, and food products on the EWG Healthy Living app.

DOWNLOAD THE APP

EWG research
on the go

Find personal care, cleaning, and food products on the EWG Healthy Living app.

DOWNLOAD THE APP