Portion, package or tableware size for changing selection and consumption of food, alcohol and tobacco

Summary of findings for the main comparison. Food: Larger versus smaller‐sized portions, packages or tableware for changing quantity consumed or selected

Food: Larger versus smaller‐sized portions, packages or tableware for changing quantity consumed or selected
Population: children and adults Settings: high‐income countries, laboratory and field settings Intervention: larger‐sized portion, package, individual unit or item of tableware Comparison: smaller‐sized portion, package, individual unit or item of tableware
Outcomes	*Illustrative comparative risks (95% CI)**		Relative effect (95% CI)	No of participants (studies)	Quality of the evidence (GRADE)	Comments
	Assumed risk	Corresponding risk
	Smaller‐sized portion, package, individual unit or item of tableware	Larger‐sized portion, package, individual unit or item of tableware
Consumption	Mean daily energy intake from food among a representative sample of UK children and adults is 1689 kcal³	Mean daily energy intake from food would be 189 kcal (11.2%) higher with the intervention (144 to 228 kcal higher) among UK children and adults	Mean consumption in the intervention group was 0.38 standard deviations higher (0.29 higher to 0.46 higher)	6603 (86 independent comparisons)	⊕⊕⊕⊝ MODERATE ¹
‐ Consumption among children	Mean daily energy intake from food among a representative sample of UK children is 1651 kcal³	Mean daily energy intake from food would be 95 kcal (5.7%) higher with the intervention (45 to 140 kcal higher) among UK children	Mean consumption in the intervention group was 0.21 standard deviations higher (0.1 higher to 0.31 higher)	1421 (22 independent comparisons)	⊕⊕⊕⊝ MODERATE ¹
‐ Consumption among adults	Mean daily energy intake from food among a representative sample of UK adults is 1727 kcal³	Mean daily energy intake from food would be 247 kcal (14.3%) higher with the intervention (215 to 279 kcal higher) among UK adults	Mean consumption in the intervention group was 0.46 standard deviations higher (0.40 higher to 0.52 higher)	5182 (64 independent comparisons)	⊕⊕⊕⊝ MODERATE ¹
Selection without purchase	Mean daily energy intake from food among a representative sample of UK children and adults is 1689 kcal³	Mean daily energy intake from food would be 209 kcal (12.4%) higher with the intervention (119 to 293 kcal higher) among UK children and adults⁴	Mean selection without purchase in the intervention group was 0.42 standard deviations higher (0.24 higher to 0.59 higher)	1164 (13 independent comparisons)	⊕⊕⊕⊝ MODERATE ¹
‐ Selection without purchase among children	Mean daily energy intake from food among a representative sample of UK children is 1651 kcal³	Mean daily energy intake from food would be 63 kcal (3.8%) higher with the intervention (27 to 153 kcal higher) among UK children⁴	Mean selection without purchase in the intervention group was 0.14 standard deviations higher (0.06 lower to 0.34 higher)	382 (4 independent comparisons)	⊕⊕⊝⊝ LOW ^1,2
‐ Selection without purchase among adults	Mean daily energy intake from food among a representative sample of UK adults is 1727 kcal³	Mean daily energy intake from food would be 188 kcal (10.9%) higher with the intervention (188 to 403 kcal higher) among UK adults⁴	Mean selection without purchase in the intervention group was 0.55 standard deviations higher (0.35 higher to 0.75 higher)	782 (9 independent comparisons)	⊕⊕⊕⊝ MODERATE ¹
The basis for the assumed risk* (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in representative UK samples³ and the relative effect of the intervention (and its 95% CI). CI: confidence interval
GRADE Working Group grades of evidence High quality: Further research is very unlikely to change our confidence in the estimate of effect. Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low quality: We are very uncertain about the estimate.
¹Rated down by one level for study limitations: we assessed risk of bias as unclear or high in all incorporated studies. ²Rated down by one level for imprecision: number of participants (effective sample size) incorporated into analysis is less than the number of patients generated by a conventional sample size calculation for a single adequately powered trial (optimal information size) and the confidence interval crosses zero. ³Estimates of means and standard deviations based on an unweighted analysis of data from the UK National Diet and Nutrition Survey, Years 1‐4 (National Centre for Social Research 2012) ‐ see Data synthesis. ⁴Illustration of equivalent absolute effect on daily energy intake from food assumes that all foods selected are consumed.

Summary of findings 2. Alcohol: Larger versus smaller‐sized portions, packages or tableware for changing quantity consumed or selected

Alcohol: Larger versus smaller‐sized portions, packages or tableware for changing quantity consumed or selected
Population: children and adults Settings: high‐income countries, laboratory and field settings Intervention: larger‐sized portion, package, individual unit or item of tableware Comparison: smaller‐sized portion, package, individual unit or item of tableware
Outcomes	*Illustrative comparative risks (95% CI)**		Relative effect (95% CI)	No of participants (studies)	Quality of the evidence (GRADE)	Comments
	Assumed risk	Corresponding risk
	Smaller‐sized portion, package, individual unit or item of tableware	Larger‐sized portion, package, individual unit or item of tableware
Consumption	No evidence is available	‐	‐	(0 independent comparisons)	‐	‐
‐ Consumption among children	No evidence is available	‐	‐	(0 independent comparisons)	‐	‐
‐ Consumption among adults	No evidence is available	‐	‐	(0 independent comparisons)	‐	‐
Selection with or without purchase	No evidence is available	‐	‐	(0 independent comparisons)	‐	‐
‐ Selection with or without purchase among children	No evidence is available	‐	‐	(0 independent comparisons)	‐	‐
‐ Selection with or without purchase among adults	No evidence is available	‐	‐	(0 independent comparisons)	‐	‐
The basis for the assumed risk* (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). CI: confidence interval
GRADE Working Group grades of evidence High quality: Further research is very unlikely to change our confidence in the estimate of effect. Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low quality: We are very uncertain about the estimate.

Summary of findings 3. Tobacco: Longer versus shorter cigarettes for changing quantity consumed or selected

Tobacco: Longer versus shorter cigarettes for changing quantity consumed or selected
Population: children and adults Settings: high‐income countries, laboratory settings Intervention: longer cigarettes Comparison: shorter cigarettes
Outcomes	*Illustrative comparative risks (95% CI)**		Relative effect (95% CI)	No of participants (studies)	Quality of the evidence (GRADE)	Comments
	Assumed risk	Corresponding risk
	Shorter cigarettes	Longer cigarettes
Consumption	Mean number of cigarettes smoked per day among a representative sample of UK adults is 13	Mean number of cigarettes smoked per day would be 2 higher with the intervention (1 to 5 higher) among UK adults	Mean consumption in the intervention group was 0.25 standard deviations higher (0.14 lower to 0.65 higher)	108 (6 independent comparisons)	⊕⊕⊝⊝ LOW ^1,2	‐
‐ Consumption among children	No evidence is available	‐	‐	(0 independent comparisons)	‐	‐
‐ Consumption among adults	Mean number of cigarettes smoked per day among a representative sample of UK adults is 13	Mean number of cigarettes smoked per day would be 2 higher with the intervention (1 to 5 higher) among UK adults	Mean consumption in the intervention group was 0.25 standard deviations higher (0.14 lower to 0.65 higher)	108 (6 independent comparisons)	⊕⊕⊝⊝ LOW ^1,2	‐
Selection with or without purchase	No evidence is available	‐	‐	(0 independent comparisons)	‐	‐
‐ Selection with or without purchase among children	No evidence is available	‐	‐	(0 independent comparisons)	‐	‐
‐ Selection with or without purchase among adults	No evidence is available	‐	‐	(0 independent comparisons)	‐	‐
The basis for the assumed risk* (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). CI: confidence interval
GRADE Working Group grades of evidence High quality: Further research is very unlikely to change our confidence in the estimate of effect. Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low quality: We are very uncertain about the estimate.
¹Rated down by one level for study limitations: we assessed risk of bias as unclear or high in all incorporated studies. ²Rated down by one level for imprecision: number of participants (effective sample size) incorporated into analysis is less than the number of patients generated by a conventional sample size calculation for a single adequately powered trial (optimal information size) and confidence interval crosses zero. ³Estimates of means and standard deviations based on an unweighted analysis of data from the UK Opinions and Lifestyle Survey, 2012 (Office for National Statistics 2012) ‐ see Data synthesis.

Summary of findings 4. Food: Shorter, wider versus taller, narrower glasses or plastic bottles (shape) for changing quantity of non‐alcoholic beverages consumed or selected

Shorter, wider versus taller, narrower glasses or plastic bottles (shape) for changing quantity of non‐alcoholic beverages consumed or selected
Patient or population: children and adults Settings: high‐income countries, field settings Intervention: shorter, wider glasses or plastic bottles Comparison: taller, narrower glasses or plastic bottles
Outcomes	*Illustrative comparative risks (95% CI)**		Relative effect (95% CI)	No of participants (studies)	Quality of the evidence (GRADE)	Comments
	Assumed risk	Corresponding risk
	Shorter, wider glasses or plastic bottles	Taller, narrower glasses or plastic bottles
Consumption	Mean quantity of energy‐containing non‐alcoholic beverages consumed in a single serve among a representative sample of UK adults is 245 grams⁸	Mean quantity of energy‐containing non‐alcoholic beverages consumed in a single serve would be 195 grams (79.6%) higher with the intervention (95 to 296 grams higher) among UK adults	Mean consumption in the intervention group was 1.17 standard deviations higher (0.57 higher to 1.78 higher)	50 (1 independent comparison)	⊕⊝⊝⊝ VERY LOW ^1,2	‐
‐ Consumption among adults	Mean quantity of energy‐containing non‐alcoholic beverages consumed in a single serve among a representative sample of UK adults is 245 grams⁸	Mean quantity of energy‐containing non‐alcoholic beverages consumed in a single serve would be 195 grams (79.6%) higher with the intervention (95 to 296 grams higher) among UK adults	Mean consumption in the intervention group was 1.17 standard deviations higher (0.57 higher to 1.78 higher)	50 (1 independent comparison)	⊕⊝⊝⊝ VERY LOW ^1,2	‐
‐ Consumption among children	No evidence is available	‐	‐	(0 independent comparisons)	‐	‐
Selection without purchase	Mean quantity of energy‐containing non‐alcoholic beverages consumed in a single serve among a representative sample of UK children and adults is 234 grams⁸	Mean quantity of energy‐containing non‐alcoholic beverages consumed in a single serve would be 242 grams (103.4%) higher with the intervention (86 to 400 grams higher) among UK children and adults⁹	Mean selection without purchase in the intervention group was 1.47 standard deviations higher (0.52 higher to 2.43 higher)	232 (3 independent comparisons)	⊕⊕⊝⊝ LOW ^3,4	‐
‐ Selection without purchase among children	Mean quantity of energy‐containing non‐alcoholic beverages consumed in a single serve among a representative sample of UK children is 228 grams⁸	Mean quantity of energy‐containing non‐alcoholic beverages consumed in a single serve would be 377 grams (165.5%) higher with the intervention (292 to 462 grams higher) among UK children⁹	Mean selection without purchase in the intervention group was 2.31 standard deviations higher (1.79 higher to 2.83 higher)	96 (1 independent comparison)	⊕⊕⊝⊝ LOW ^{5, 6}	‐
‐ Selection without purchase among adults	Mean quantity of energy‐containing non‐alcoholic beverages consumed in a single serve among a representative sample of UK adults is 245 grams⁸	Mean quantity of energy‐containing non‐alcoholic beverages consumed in a single serve would be 171 grams (70.1%) higher with the intervention (68 to 274 grams higher) among UK adults⁹	Mean selection without purchase in the intervention group was 1.03 standard deviations higher (0.41 higher to 1.65 higher)	136 (2 independent comparisons)	⊕⊕⊝⊝ LOW ^3,7	‐
The basis for the assumed risk* (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). CI: confidence interval
GRADE Working Group grades of evidence High quality: Further research is very unlikely to change our confidence in the estimate of effect. Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low quality: We are very uncertain about the estimate.
¹Rated down two levels for study limitations: study assessed at high risk of bias with respect to the consumption outcome (see Characteristics of included studies 'Risk of bias' tables). ²Rated down one level for imprecision: number of participants (effective sample size) incorporated into analysis is less than the number of patients generated by a conventional sample size calculation for a single adequately powered trial (optimal information size) based on the lower limit of the confidence interval. ³Rated down one level for study limitations: studies assessed at unclear or high risk of bias with respect to the selection outcome (see Characteristics of included studies 'Risk of bias' tables). ⁴Rated down one level for inconsistency. I² statistic from the random‐effects model shows that 90.1% of the total variance in study‐level estimates of this effect was due to statistical heterogeneity. ⁵Rated down one level for study limitations: study assessed at unclear risk of bias with respect to the selection outcome (see Characteristics of included studies 'Risk of bias' tables). ⁶Rated down one level for imprecision: single study. ⁷Rated down one level for inconsistency: point estimates are dissimilar and confidence intervals do not overlap. ⁸Estimates of means and standard deviations based on an unweighted analysis of data from the UK National Diet and Nutrition Survey, Years 1‐4 (National Centre for Social Research 2012) ‐ see Data synthesis. ⁹Illustration of equivalent absolute effect on quantity of energy‐containing non‐alcoholic beverages consumed in single serve assumes that all energy‐containing non‐alcoholic beverage selected in a single serve is consumed.

Background

Description of the condition

Non‐communicable diseases, principally cardiovascular diseases, diabetes, certain forms of cancer and chronic respiratory diseases, accounted for an estimated 62% of all deaths worldwide in 2012 (World Health Organization 2014a), and globally the proportion of years of life lost as a result of non‐communicable diseases increased from 38% in 2000 to 47% in 2012 (World Health Organization 2014b). Major risk factors for these conditions are in part determined by patterns of behaviour that are in principle modifiable, including consumption of food, alcohol and tobacco products (United Nations 2014). Identifying interventions that are effective in achieving sustained health behaviour change has therefore become one of the most important public health challenges of the 21^st century.

Description of the intervention

It is increasingly recognised that the physical environments that surround us can exert considerable influences on our health behaviour and that altering these environments may provide a catalyst for behaviour change (Das 2012). In a recent scoping review, we described a class of interventions that involve altering the properties or placement of objects or stimuli within micro‐environments such as shops, restaurants, bars or homes, with the intention of changing health‐related behaviours (Hollands 2013a; Hollands 2013b).

The size of a portion or package is a modifiable property of food, alcohol and tobacco products that may influence their selection and consumption. In the case of food and alcohol products, the size or shape of an item of tableware used to consume such products may similarly influence their selection and consumption. Examples include the portion size of alcoholic beverages served in bars or of foods served in restaurants, at a buffet or in the home, such as portions of a dish served to restaurant customers (Diliberti 2004), the size or shape of plates or glasses used to serve products (Shah 2011), and the number or length of cigarettes in packets sold in shops (Russell 1980). In this context, the intervention involves manipulation of the size or physical dimensions of a food, alcohol or tobacco product, its packaging or the tableware used in its consumption. Comparisons of interest are between products, packages or items of tableware that differ only in terms of these properties.

How the intervention might work

There are considerable influences on behaviour that are beyond individuals' deliberative control. Indeed, it has been suggested that most human behaviour occurs outside of awareness, cued by stimuli in environments and resulting in actions that may be largely unaccompanied by conscious reflection (Marteau 2012; Neal 2006). This proposition has led to increasing policy and research attention being placed on interventions with mechanisms of action that are less dependent on the conscious engagement of the recipients, including interventions that involve altering properties of objects or stimuli within the small‐scale environments that surround and cue behaviour (Hollands 2013a).

A number of mechanisms of action have been proposed to explain how the size of products may affect their consumption (Herman 2015; Steenhuis 2009). It has been suggested that as the amount of a product made available for consumption is increased, individuals will continue to perceive each increasing amount as an appropriate quantity to consume. This phenomenon may be explained by several mediating factors including personal and social norms about what constitutes a suitable amount of a product to consume. Such norms can be influenced by the amounts that are presented for consumption, and larger portions of food have become increasingly prevalent, making it increasingly unlikely that smaller portions are viewed as normal or appropriate for a single serving (Young 2002). There is also a tendency for individuals to engage most comfortably with a product as a single entity independent of its size. This 'unit bias' means that they are predisposed to consume the entirety of a product even as it changes size (Geier 2006). In addition, the way in which products are presented can influence their consumption. The presentation of food and alcohol products often entails the use of tableware, such as plates, glasses or cutlery. Not only does the size of tableware have the potential to directly influence the amount of a product available for consumption (Pratt 2012), but its physical dimensions can elicit various cognitive biases (Wansink 2005), which may influence perceptions of quantity and in turn determine levels of consumption. Similarly, sub‐dividing a fixed portion of a food into smaller pieces also affects perceptions of quantity (Scisco 2012). All of these mechanisms may also influence product selection (with or without purchasing), which is an important intermediate outcome in pathways to consumption.

Extant research involving the experimental manipulation of portion, package or tableware size has focused on food (including non‐alcoholic beverage) products to a much greater extent than tobacco products (Hollands 2013a). Whilst the causal mechanisms of underlying potential effects of such manipulations on selection or consumption of tobacco may be assumed to be broadly similar to food, smokers are known to titrate their received dose of nicotine to regulate the level in the body, with the potential to attenuate the effects of interventions to alter the size of tobacco products (Kozlowski 1986).

Why it is important to do this review

A recent scoping review of evidence for the effects of choice architecture interventions identified a substantial number of randomised controlled trials that have investigated the effects of exposure to different portion, package or tableware sizes on selection and consumption behaviours (Hollands 2013a). The majority of these studies focused on food products, but because both tobacco and alcohol use also involve the selection and consumption of products, similar interventions may have the potential to change these behaviours via similar mechanisms. To our knowledge, evidence from these studies has yet to be synthesised using rigorous systematic review methods that include assessment of risk of bias and investigation of potential effect modifiers, nor to encompass alcohol and tobacco use. As such, we do not yet have reliable estimates of the effects of altering the sizes of portions, packages or tableware on product selection and consumption, nor of the influence of factors that may modify any such effects. Both are necessary to inform the selection and design of effective public health interventions.

Interventions that aim to reduce people's exposure to larger or smaller food portions, as opposed to those that involve providing information to encourage health behaviour change, may also have the potential to reduce health inequalities if they rely less on recipients' levels of literacy, numeracy and cognitive control, which have been found to be lower in population subgroups experiencing higher levels of social and material deprivation (Kutner 2006; Marteau 2012; Spears 2010; Williams 2003). Despite evidence that behaviours with the potential to undermine health are socially patterned (for example, that people in lower socioeconomic groups tend to consume less fruit and vegetables (Giskes 2010)), potential differences in behavioural responses to product sizing interventions between socioeconomic subgroups remain unclear. Also, to our knowledge (prior to conducting this review), no studies of the effects of product size had been conducted in low or middle‐income (LMIC) country populations (Hollands 2013a). This review therefore includes a focus on identifying evidence for differential effects of exposure to different sizes of these products between socioeconomic subgroups (and between studies conducted in LMIC and high‐income countries (HIC)), highlight any identified gaps in this aspect of the evidence base, and seek to draw implications for the potential of such interventions to affect health inequalities.

This systematic review is also timely given current interest in the topic within public health policy circles. There is evidence from the USA and Europe that portion sizes have been increasing since the 1970s (Young 2002; Young 2012). There have also been recent attempts to regulate the size of products in order to reduce consumption levels and improve public health, such as New York City Mayor Michael Bloomberg's proposed ban on the sale of sugary drinks larger than 16 oz (473 ml) (Gabbatt 2013). In the UK, there are recent examples of companies reducing the portion sizes of confectionery and sugary drinks as part of the Public Health Responsibility Deal in England. This systematic review can contribute to a better evidence‐based understanding of the potential impact of such policies.

Objectives

To assess the effects of interventions involving exposure to different sizes or sets of physical dimensions of a portion, package, individual unit or item of tableware on unregulated (ad libitum) selection or consumption of food, alcohol or tobacco products in adults and children.
To assess the extent to which the effects of such interventions may be modified by:
1. study characteristics, such as target product type (food, alcohol, tobacco) or whether the target of the manipulation is a portion, package, individual unit or item of tableware;
2. intervention characteristics, such as magnitude of the difference in size; and
3. participant characteristics, such as age, gender or socioeconomic status (to facilitate an assessment of social differentiation in effects relevant to health equity).

Methods

Criteria for considering studies for this review

Types of studies

Randomised controlled trials with between‐subjects (parallel‐group) or within‐subjects (cross‐over) designs, conducted in laboratory or field settings. We excluded non‐randomised studies because our recent scoping review indicated that a sufficient number of eligible randomised controlled trials would be available to address our aim to synthesise evidence for intervention effects (Hollands 2013a). A key issue is that, compared with randomised controlled trials, non‐randomised studies rely on more stringent and sometimes non‐verifiable assumptions in order to confer confidence that, with successful implementation of the study design, the risk of systematic differences between comparison groups beyond the intervention of interest (i.e. confounding) is sufficiently low to permit valid inferences about causal effects.

Types of participants

Adults and children directly engaged with the manipulated products. We set no exclusion criteria in relation to demographic, socioeconomic or clinical characteristics or prognostic factors. We excluded studies involving non‐human participants (animal studies).

Types of interventions

Interventions eligible to be considered in this review were those that involved comparison of the effects of exposure to at least two sizes or sets of visible physical dimensions (that is volume, shape, height, width or depth) of either a portion of the same food (including non‐alcoholic beverages), alcohol or tobacco product, its package or individual unit size, or an item of tableware used to consume it. An eligible study could therefore include multiple eligible comparisons. For example, in a three‐arm between‐subjects study comparing the effects of exposure to a 200 g, 300 g or 400 g portion of pasta with sauce, eligible comparisons are: 200 g versus 300 g; 300 g versus 400 g; and 200 g versus 400 g (see also Data synthesis).

'Portion' refers to the overall amount (volume, weight or both) of a product that is presented for selection or consumption (for example, 200 g versus 300 g of pasta, 275 ml versus 440 ml of beer, or a packet of 10 versus 20 cigarettes). 'Package' refers to the different ways of packaging a specific portion, including that used for service, consumption or storage (for example, boxes, bags, cans or bottles). For example, the same portion of a food could be served within one large bag or multiple smaller bags. 'Individual unit' refers to the unit of a product that is presented within a given portion (for example, individual sweets or candies, biscuits or cookies, or cigarettes). 'Tableware' refers to crockery, cutlery or glassware used for serving or consuming food or drink (for example, plates, bowls, knives, forks, spoons or glasses). Packages and tableware as defined in this way have the capacity to limit or increase the portion or individual unit size of the consumed product and may therefore influence any corollary effects on selection and consumption.

We excluded the following:

Interventions in which product size and/or shape may have been altered indirectly as a result of a higher‐level intervention but were not directly manipulated, to safeguard implementation fidelity (e.g. organisational‐level interventions to encourage the introduction of small‐scale environmental changes to alter product selection or consumption).
Interventions in which the behavioural responses of participants (that is, selection or consumption levels or rates) were regulated by either explicit instructions to participants or some other action of the researcher (e.g. participants exposed to a product were given instructions on how much they should consume or a target rate of consumption). In such cases, selection or consumption of the manipulated product cannot be considered unregulated (ad libitum).
Studies that compared packages, portions, individual units or tableware of different types or with different functions. For example, we excluded studies that made comparisons between different, differently sized eating utensils (e.g. straw versus spoon; chopsticks versus fork) whilst studies that made comparisons between different sizes of the same eating utensil were included (e.g. small spoon versus large spoon).
Studies in which there were concurrent interventions unrelated to sizing that were intrinsically confounded with the comparison(s) of interest. For example, we excluded two‐arm studies in which one comparison group received a specified portion size and the other group received a smaller portion plus a concurrent nutritional labelling intervention.

Types of outcome measures

Primary outcomes

Behavioural endpoints

Eligible studies had to incorporate one or more measures of unregulated (ad libitum) consumption or selection (with or without purchasing) of food, alcohol or tobacco products. By unregulated, we refer to behaviour of participants that is not regulated by either explicit instructions or some other action of the researcher. Eligible studies may have measured consumption or selection in terms of quantities of manipulated products and/or quantities of non‐manipulated products. For example, a study investigating the effects of exposure to a large versus small portion of a pasta entrée, provided as part of a lunch meal, may have measured consumption in terms of energy intake from the entrée itself, or from a non‐manipulated vegetable side dish served with the entrée, or from the total lunch meal (that is, both manipulated and non‐manipulated components), or from all meals taken over the course of a whole day. Similarly, quantities consumed or selected may have been measured over a time period less than (immediate) or exceeding one day (longer‐term).

Our choice of eligible outcome constructs reflected a focus on the assessment of the effects of eligible interventions in terms of the types and amounts of food, alcohol and tobacco people consume, coupled with recognition that amount selected (with or without purchasing) is an important intermediate endpoint in pathways to consumption. We anticipated encountering a range of measures of these outcome constructs within included studies, and presented the following examples in the published protocol for this review.

1. Consumption (intake) of a product

We assessed the amount of energy (e.g. calories), substances (e.g. carbon monoxide, alcohol, saturated fat), or products (e.g. food, drink or tobacco) consumed, measured in applicable natural units (e.g. kcals, kilojoules, grams). Objective measurement may involve calculating the amount of a product consumed by subtracting the amount remaining after consumption from the amount presented to the participant. Alternatively, it may involve direct observation of the individual by outcome assessors. Subjective measurement would involve participant self report.

2. Selection of a product

a) Without purchase

b) With purchase

As per consumption, we assessed the amount of energy, substances or products selected for consumption, measured in applicable natural units. Depending on the study setting, a product may be selected with or without this act enjoining a purchase (that is, a transfer of money to the vendor).

Conceptual model

To supplement study eligibility criteria, we developed a provisional conceptual model that was published in the protocol for this review (Hollands 2014). This conceptual model was design‐oriented in the sense that its purpose was to help direct the review process by providing a simplified visual representation of the causal system of interest: the proposed causal pathway between eligible interventions and their outcomes (behavioural endpoints), and potential moderators of that relationship (effect modifiers) given that differential effects were plausible (Anderson 2011; Anderson 2013). We used the provisional conceptual model to inform the development of search strategies, data extraction forms and a provisional framework for the statistical analysis of outcome data collected from the eligible studies (see Search methods for identification of studies and Data collection and analysis). We iteratively revised the provisional conceptual model based on theory and evidence encountered in eligible studies during the course of the review process, and documented all revisions including the rationale for each revision and supporting evidence (see Data collection and analysis). We used the provisional and subsequent iterations of the conceptual model as a reference point for the design (in the protocol) and conduct (post‐protocol) of all stages of the systematic review up to and including data synthesis, and as a conceptual basis for explicit reporting of the methods and assumptions employed within the synthesis (Anderson 2013). In practice, iterative refinement of the conceptual model primarily involved incorporating further potential effect modifiers identified from theory and evidence presented in included study reports, which became candidates for consideration in the meta‐regression analysis (see Data collection and analysis). The final version of the conceptual model is shown Figure 1.

Figure 1

Final conceptual model. The 28 constructs included in the provisional conceptual model (Hollands 2014) and retained in this final version are shown in plain type. The 22 constructs added to this final conceptual model based on theory and evidence encountered during the review process are shown in red type. The 2 constructs included in the provisional conceptual model (Hollands 2014) but excluded from this final version are shown in strikethrough plain type. See Table 1 for a full record of the conceptual model development process.

Within the conceptual model (Figure 1) we distinguished between three sets of potential effect modifiers: study characteristics; intervention characteristics; and participant characteristics. Within our analytic framework for quantitative synthesis of outcome data collected from the included studies (see Data collection and analysis), potential effect‐modifying impacts of participant characteristics could in practice only be investigated based on between‐study comparisons, due to lack of reporting of results by participant subgroups within the included studies.

Search methods for identification of studies

We initiated an original search, applying the methods described below in this section, in November 2012. We conducted an updated search, applying the same methods, prior to publication of the current version of the review, with a search date up to and including 30 January 2015. We have added eligible studies identified by the updated search (with subsequent title/abstract and full‐text screening) to Characteristics of studies awaiting classification, provisionally analysed them and will fully incorporate them into the review at the next update (see also Results of the search, Appendix 1 and Appendix 2).

Electronic searches

We conducted electronic searches for eligible studies within each of the following databases:

Cochrane Central Register of Controlled Trials (CENTRAL 2015, Issue 1) (1992 to 30 January 2015);
MEDLINE (OvidSP) (including MEDLINE In‐Process) (1946 to 30 January 2015);
EMBASE (OvidSP) (1980 to 30 January 2015);
PsycINFO (OvidSP) (1806 to 30 January 2015);
Applied Social Sciences Index and Abstracts (ProQuest) (1987 to 30 January 2015);
Food Science and Technology Abstracts (Web of Knowledge) (1969 to 22 November 2012);
Science Citation Index Expanded (Web of Knowledge) (1900 to 30 January 2015);
Social Sciences Citation Index (Web of Knowledge) (1956 to 30 January 2015);
Trials Register of Promoting Health Interventions (EPPI Centre) (2004 to 30 January 2015).

We developed a MEDLINE search strategy by combining sets of controlled vocabulary and free‐text search terms based on the eligibility criteria described above (see Criteria for considering studies for this review). This was externally peer‐reviewed by an information retrieval specialist and Co‐convenor of the Cochrane Information Retrieval Methods Group and revised based on their peer‐review comments. We tested the MEDLINE search strategy for its sensitivity to retrieve a reference set of 48 records of reports of potentially eligible studies known to be indexed in MEDLINE that were identified by our preceding scoping review (Hollands 2013a). We adapted the final MEDLINE search strategy for use to search each of the other databases listed above based on close examination of database thesauri and scope notes if available. We imposed no restrictions for publication date, publication format or language and incorporated no study design filters. Full details of final search strategies for each database, along with search dates and yields (for both the original search and the updated search), are provided in Appendix 1.

Searching other resources

We conducted electronic searches of two grey literature resources using search strategies adapted from the final MEDLINE search strategy:

Conference Proceedings Citation Index ‐ Science (Web of Knowledge) (1990 to 30 January 2015);
Conference Proceedings Citation Index ‐ Social Science & Humanities (Web of Knowledge) (1990 to 30 January 2015);
Open Grey ‐ www.opengrey.eu (1980 to 30 January 2015).

We also searched trial registers (ClinicalTrials.gov and the World Health Organization International Clinical Trials Registry Platform (ICTRP)) to identify registered trials, and the websites of the following key organisations in the area of health and nutrition:

Centers for Disease Control and Prevention, USA;
EU Platform for Action on Diet, Physical Activity and Health;
International Obesity Task Force;
Rudd Centre for Food Policy and Obesity, USA;
UK Department of Health;
World Health Organization.

In addition, we searched the reference lists of all eligible study reports that had been identified using the other search methods described above and undertook forward citation tracking (using Google Scholar and PubMed) to identify further eligible studies or study reports.

Data collection and analysis

Selection of studies

We imported title‐abstract records retrieved by the electronic searches to EPPI Reviewer 4 (ER4) systematic review software (Thomas 2010). We identified, reviewed manually and removed duplicate records using ER4's automatic de‐duplication feature with the similarity threshold set initially to 0.85 and finally to 0.80 following satisfactory manual checks of incomplete duplicate groups. Two researchers working independently (GJH, IS) undertook duplicate screening of title‐abstract records. We coded title‐abstract records as 'provisionally eligible', 'excluded' or 'duplicate' by applying the eligibility criteria described above (see Criteria for considering studies for this review). Disagreements in the coding of title‐abstract records were identified and resolved by discussion to reach consensus between the two researchers (GJH, IS).

We obtained copies of corresponding full‐text study reports for all title‐abstract records coded as 'provisionally eligible'. Two researchers working independently (GJH, IS) undertook duplicate screening of full‐text study reports. We coded full‐text study reports as 'eligible' or 'excluded' by applying the eligibility criteria described above (see Criteria for considering studies for this review). Coding disagreements were again identified and resolved by discussion to reach consensus between the two researchers, with a third researcher (DO) acting as arbiter when needed. We recorded bibliographic details of study reports excluded at the full‐text screening stage, along with the primary reason for exclusion, in a Characteristics of excluded studies table. We identified and linked multiple full‐text reports of the same study. We also identified full‐text reports comprising multiple eligible studies. We documented the flow of records and studies through the systematic review process using a PRISMA flow diagram (Moher 2009).

Data extraction and management

We developed an electronic data extraction form based on the Cochrane Public Health Review Group's template (http://ph.cochrane.org/review‐authors). We piloted an initial draft form using a selection of 10 included studies and then amended this in consultation with other members of the review team. One researcher (GJH or IS) extracted data on characteristics of included studies, while two researchers working independently (GJH, IS) extracted outcome data in duplicate. We only collected outcome data relating to comparison groups eligible for consideration in this review, but Characteristics of included studies tables record details of all study arms (conditions). Discrepancies in extracted outcome data were identified and resolved by checking against the study report, discussion and consensus between two researchers (GJH, IS). We sought key data missing from reports of included studies by contacting study authors.

At the protocol stage, we intended to collect the data summarised immediately below in this section. This represented the core data set (comprising 28 pre‐specified moderator constructs for potential examination using meta‐regression analyses; see Data synthesis) that we could reasonably anticipate would need to be collected based on our study eligibility criteria (see Criteria for considering studies for this review) and provisional conceptual model (Hollands 2014).

Study characteristics

Study design: between‐subjects design, within‐subjects design
Study (intervention) setting: laboratory, field; for consumption at home or away from home
Product type: food (including non‐alcoholic beverages), alcohol, tobacco
Product healthiness: Food Standards Agency (FSA) score (Rayner 2005) at level of specific product or, if not possible, at level of product category
Target of manipulation: portion, package, individual unit, tableware
Type of manipulation: size (including volume) or shape
Manipulation from a standard size: no or yes*
If applicable, direction of the change relative to standard size: smaller or larger*
If applicable, selection with purchasing or selection without purchasing
Concurrent intervention components (e.g. nutritional labelling)
Socioeconomic status context (low, high)

Intervention characteristics

Magnitude of the absolute difference in size (e.g. difference in quantity): smaller size always coded as Intervention 1 and larger size as Intervention 2
Magnitude of the relative difference in size (e.g. percentage difference in quantity): smaller size always coded as Intervention 1 and larger size as Intervention 2

Participant characteristics

Age/age group
Gender: male, female
Ethnicity
Body mass index (BMI); body weight; body weight status
Behavioural characteristics (e.g. dietary restraint; susceptibility to hunger)
Biological state (e.g. hunger)
Other clinical characteristics (e.g. morbidities such as cardiovascular diseases, diabetes, psychiatric disorders)
Socioeconomic status (e.g. occupational status; education; income; food insecurity; welfare receipt)
Summary risk of bias

These participant characteristics cover several categories of social differentiation relevant to health equity, namely: age, ethnicity, gender, occupation, education, income and other proxy measures of socioeconomic status. The incorporation of study‐level data on these participant characteristics into our proposed meta‐regression analysis (see 'Data synthesis') was in part intended to enable us to interpret any differential effects through a health equity lens (Welch 2012) (see also Objectives 2c).

As anticipated, our conceptual model ‐ and consequently the core data set ‐ evolved as the review process progressed. First, we excluded a pair of potential effect modifiers (study characteristics) included in our provisional conceptual model that express studied portion size manipulations relative to a standard size (see asterisked characteristics '*' in the list of 'Study characteristics', above), since it was not judged feasible to define standard sizes based on information reported in included studies. Second, the process of collecting data from included studies identified 22 additional potential effect modifiers (moderator constructs) that were added to the conceptual model. These additional constructs were included in the current, published review version of the conceptual model (Figure 1) and are listed below:

Study characteristics

Product energy density
Duration of exposure
Relationship between manipulated product(s) and outcome(s)

Intervention characteristics

None added.

Participant characteristics

Behavioural characteristics (susceptibility to hunger; external eating; emotional eating; plate cleaning tendency; consumption monitoring; binge eating; dieting behaviour; mood; habitual dietary energy intake; habitual dietary macronutrient intake (carbohydrate; protein; fat); physical activity; energy expenditure; physical exercise)
Biological state (fullness; satiety; prospective consumption)
Other clinical characteristics (depression)

We coded 28 variables that measured these constructs from included studies (as well as coding 43 variables that measured constructs included in the initial conceptual model). The current, published review version of our conceptual model (Figure 1) therefore comprised 48 moderator constructs, with 72 corresponding variables, for potential examination using meta‐regression analyses. Table 1 traces this iterative conceptual model development process, documenting all revisions made between the protocol (Hollands 2014) and final versions (Figure 1), together with the rationale and supporting evidence for each revision.

See: Summary of findings for the main comparison Food: Larger versus smaller‐sized portions, packages or tableware for changing quantity consumed or selected; Summary of findings 2 Alcohol: Larger versus smaller‐sized portions, packages or tableware for changing quantity consumed or selected; Summary of findings 3 Tobacco: Longer versus shorter cigarettes for changing quantity consumed or selected; Summary of findings 4 Food: Shorter, wider versus taller, narrower glasses or plastic bottles (shape) for changing quantity of non‐alcoholic beverages consumed or selected

Table 1. Record of conceptual model development

Construct	Variable description (type)	Category	Included in provisional conceptual model?	Included in final conceptual model?	Included study first encountered	Other included studies encountered	Rationale for inclusion in final conceptual model	Supporting evidence	Rationale for exclusion from final conceptual model
Study design	Randomised controlled trial or cluster‐randomised controlled trial (Categorical, dichotomous)	Study characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Study design	Between subjects or within‐subjects design (Categorical, dichotomous)	Study characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Study/ intervention setting	Laboratory or field setting (Categorical, dichotomous)	Study characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Study/ intervention setting	Selecting/consuming alone or selecting/consuming with others	Study characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Product type	Food or tobacco (or alcohol ‐ no studies) (Categorical, dichotomous)	Study characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Healthiness of manipulated product(s) (food products only)	FSA Nutrient Profile Score (Continuous)	Study characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Basis for calculating healthiness of manipulated product(s) (food products only)	Specific product or product category (Categorical, dichotomous)	Study characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Energy density of manipulated product(s) (food products only)	Energy density points from FSA Nutrient Profile model (Continuous)	Study characteristic	No	Yes	Devitt 2004 (study includes concurrent manipulation of energy density)	Kral 2004a, Fisher 2007b, Leahy 2008, Looney 2011, Rolls 2006b (studies include concurrent manipulation of energy density)	Evidence from previous studies that the energy density of food can exert independent and combined influences on energy intake suggests that this has the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Kral 2004a, Kral 2004b, Rolls 2009, Bell 1998, Rolls 1999, Rolls 2006b	N/A
Target of manipulation	Portion, package, individual unit, package with individual unit, or tableware (Categorical, nominal)	Study characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Type of manipulation	Size (including volume) or shape manipulation (Categorical, dichotomous)	Study characteristic	Yes	Yes	N/A	N/A	N/A	N/A	Post‐hoc decision taken to conduct separate meta‐analyses for size and shape since comparisons of size were not judged conceptually comparable to comparisons of shape among the set of studies included in this review Therefore no longer conceptualised as a potential effect modifier
Manipulation from a standard size	No or yes (Categorical, dichotomous)	Study characteristic	Yes	No	N/A	N/A	N/A	N/A	In practice it was rarely possible to code this variable based on information in study reports, and not judged practicable to code with reference to data from external sources
If applicable, direction of the change relative to standard size	Smaller or larger (Categorical, dichotomous)	Study characteristic	Yes	No	N/A	N/A	N/A	N/A	In practice it was rarely possible to code this variable based on information in study reports, and not judged practicable to code with reference to data from external sources
Selection without purchasing or selection with purchasing	Selection without purchasing or selection with purchasing (Categorical, dichotomous)	Study characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Duration of exposure to the intervention	≤ 1 day or > 1 day (Categorical, dichotomous)	Study characteristic	No	Yes	N/A – Added based on discussion of collected data between 2 review authors (GJH and IS, April 2014), which identified duration of exposure as a variant characteristic of included studies (in addition to timing of outcome measurement, which had been included in our provisional conceptual model)	N/A	Duration of exposure to larger portions, packages, individual units or tableware has the potential to modify any effects of such exposure on the selection and consumption of food	Rolls 2006a, Rolls 2007a	N/A
Relationship between manipulated product(s) and consumption/ selection outcomes (food products only)	The manipulated foods comprise all of those in the study and all are selected or consumed ad libitum (Dummy)	Study characteristic	No	Yes	N/A – Added based on discussion of collected data between 2 review authors (GJH and IS, April 2014), which identified duration of exposure as a variant characteristic of included studies	N/A	This relationship may have the potential to modify any effects of such exposure on the selection and consumption of food. This is because providing any additional foods for consumption beyond those manipulated may result in additional energy consumption in either or both conditions. Given potential ceiling effects on total consumption, this could modify any intervention effect	‐	N/A
Relationship between manipulated product(s) and consumption/ selection outcomes (food products only)	The manipulated foods are only a subset of all the foods in the study and there are other non‐manipulated foods that are compulsory to select or consume (Dummy)	Study characteristic	No	Yes	N/A – Added based on discussion of collected data between 2 review authors (GJH and IS, April 2014), which identified duration of exposure as a variant characteristic of included studies	N/A	This relationship may have the potential to modify any effects of such exposure on the selection and consumption of food. This is because providing compulsory additional foods beyond those manipulated would result in additional energy consumption in both conditions. Given potential ceiling effects on total consumption, this could attenuate any intervention effect	‐	N/A
Relationship between manipulated product(s) and consumption/ selection outcomes (food products only)	The manipulated foods are only a subset of all the foods in the study and there are other non‐manipulated foods in study that are selected or consumed ad libitum (Dummy)	Study characteristic	No	Yes	N/A – Added based on discussion of collected data between 2 review authors (GJH and IS, April 2014), which identified duration of exposure as a variant characteristic of included studies	N/A	This relationship may have the potential to modify any effects of such exposure on the selection and consumption of food. This is because providing additional foods to be consumed ad libitum beyond those manipulated may result in additional energy consumption in either or both conditions. Given potential ceiling effects on total consumption, this could modify any intervention effect	‐	N/A
Relationship between manipulated product(s) and consumption/ selection outcomes (food products only)	Outcome data maps directly onto the manipulated food(s) (as opposed to a wider set of foods, including but not limited to manipulated food(s)) (Dummy)	Study characteristic	No	Yes	N/A – Added based on discussion of collected data between 2 review authors (GJH and IS, April 2014), which identified duration of exposure as a variant characteristic of included studies	N/A	This relationship may have the potential to modify any effects of such exposure on the selection and consumption of food. This is because including any additional foods in outcome measurement beyond those manipulated may result in additional energy consumption being measured in either or both conditions	‐	N/A
Concurrent intervention component(s)	Absent or present (Categorical, dichotomous)	Study characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Socio‐economic status context	Low deprivation or high deprivation (Categorical, dichotomous)	Study characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Magnitude of the absolute difference in size	Difference between larger size and smaller size in grams (Continuous)	Intervention characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Magnitude of the relative difference in size	Larger size expressed as a proportion (%) of smaller size (Continuous)	Intervention characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Age	Average (mean) age in years among study completers (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Gender	Proportion (%) of study completers who were female (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Ethnicity	Proportion (%) of study completers of white ethnicity (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Body mass index (BMI)	Average (mean) BMI among study completers (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Body mass index (BMI)	Average (mean) BMI‐z score among study completers (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Body weight	Average (mean) weight in kilograms among study completers (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Body weight status	Average (mean) percentage (%) body fat among study completers (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Body weight status	Proportion (%) of study completers who were overweight (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Body weight status	Proportion (%) of study completers who were obese (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Body weight status	Proportion (%) of study completers who were overweight or obese (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Behavioural characteristics: dietary restraint	Average (mean) dietary restraint score among study completers ‐ Three Factor Eating Questionnaire (Stunkard 1985) (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Behavioural characteristics: dietary restraint	Average (mean) dietary restraint score among study completers ‐ Dutch Eating Behaviour Questionnaire (Van Strien 1986) (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Behavioural characteristics: dietary restraint	Average (mean) dietary restraint score among study completers ‐ Restraint Scale (Herman 1980) (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Behavioural characteristics: dietary disinhibition	Average (mean) dietary disinhibition score among study completers ‐ Three Factor Eating Questionnaire (Stunkard 1985) (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Behavioural characteristics: dietary disinhibition	Average (mean) dietary disinhibition score among study completers ‐ Dutch Eating Behaviour Questionnaire (Van Strien 1986) (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Behavioural characteristics: external eating	Average (mean) external eating score among study completers ‐ Dutch Eating Behaviour Questionnaire (Van Strien 1986) (Continuous)	Participant characteristic	No	Yes	Hermans 2012	Kelly 2009, Kral 2004a	External eating (which measures the tendency to eat in response to external food‐related cues such as the sight, taste, and smell of attractive food) has the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Herman 2008, Burton 2007, Rodin 1981	N/A
Behavioural characteristics: emotional eating	Average (mean) emotional eating score among study completers ‐ Dutch Eating Behaviour Questionnaire (Van Strien 1986) (Continuous)	Participant characteristic	No	Yes	Kelly 2009	Kral 2004a	Emotional eating (which measures the tendency to eat in response to emotions such as anxiety, disappointment or boredom) has the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Van Strien 1986, Wallis 2009	N/A
Behavioural characteristics: susceptibility to hunger	Average (mean) hunger score among study completers – Three factor eating questionnaire (Stunkard 1985) (Continuous)	Participant characteristic	No	Yes	Flood 2006	Kral 2004a, Rolls 2002, Rolls 2004a, Rolls 2004b, Rolls 2006a, Rolls 2006b, Rolls 2007a, Rolls 2007b (S1), Rolls 2007b (S2), Rolls 2007b (S3), Rolls 2010a (E1), Rolls 2010b (E2)	Susceptibility to hunger (predisposition to feelings of hunger) has the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Provencher 2003, Lindroos 1997	N/A
Behavioural characteristics: plate cleaning tendency	Average (mean) plate cleaning tendency score among study completers ‐ 7‐point agreement scale anchored (‐3) strongly disagree and (+3) strongly agree (Marchiori 2012a, Wansink 2005e) (Continuous)	Participant characteristic	No	Yes	Marchiori 2012a	‐	Plate cleaning tendency (the tendency for a person to consume all the food presented to them) has the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Wansink 2005e	N/A
Behavioural characteristic: plate cleaning tendency	Behavioural characteristic ‐ Proportion (%) of adult study completers who often or always clean the plate (Continuous)	Participant characteristic	No	Yes	Rolls 2004a	‐	Plate cleaning tendency (the tendency for a person to consume all the food presented to them) has the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Wansink 2005e	N/A
Behavioural characteristic: plate cleaning tendency	Behavioural characteristic ‐ Proportion (%) of child study completers who often or always clean the plate (Continuous)	Participant characteristic	No	Yes	Rolls 2004a	‐	Plate cleaning tendency (the tendency for a person to consume all the food presented to them) has the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Wansink 2005e	N/A
Behavioural characteristics: consumption monitoring	Average (mean) consumption monitoring score among study completers ‐ 7‐point agreement scale anchored (‐3) strongly disagree and (+3) strongly agree (Continuous)	Participant characteristic	No	Yes	Marchiori 2012a	‐	Consumption monitoring (the tendency for a person to pay attention to and monitor the food they are consuming) has the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Polivy 1986	N/A
Behavioural characteristics: binge eating	Average (mean) binge eating score among study completers ‐ Eating Disorders Examination (Fairburn 1993) (Continuous)	Participant characteristic	No	Yes	Marchiori 2012a	‐	Binge eating (discrete episodes of eating during which the amount consumed is unusually large and there is a sense of loss of control over eating at the time) has the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Fairburn 1993	N/A
Behavioural characteristics: binge eating	Average (mean) binge eating score among study completers – Binge Eating Questionnaire (Gormally 1982) (Continuous)	Participant characteristic	No	Yes	Stroebele 2009	‐	Binge eating (discrete episodes of eating during which the amount consumed is unusually large and there is a sense of loss of control over eating at the time (Fairburn 1993)) has the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Fairburn 1993, Cooper 2003	N/A
Behavioural characteristics: dieting behaviour	Average (mean) dieting behavior score – Eating Attitude Test (EAT‐26) (Garner 1982) (Continuous)	Participant characteristic	No	Yes	Marchiori 2012a	Rolls 2002, Rolls 2004a, Rolls 2004b, Rolls 2007a	Dieting behaviour (behaviour that involves a person restricting themselves to smaller amounts or specific types of food either to lose weight or for medical reasons) has the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Van Strien 1986, Stunkard 1985	N/A
Behavioural characteristics: mood	Average (mean) mood score among study completers ‐ 7‐point agreement scale anchored (‐3) strongly disagree and (+3) strongly agree (Marchiori 2012a, Reinbach 2010) (Continuous)	Participant characteristic	No	Yes	Marchiori 2012a	‐	Mood has the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Gardner 2014	N/A
Behavioural characteristics: habitual dietary energy intake	Average (mean) dietary energy intake per diem among study completers in kcal (Continuous)	Participant characteristic	No	Yes	Ahn 2010	Ebbeling 2007	Baseline level of dietary energy intake has the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Fyfe 2010, Birch 1991	N/A
Behavioural characteristics: habitual dietary macronutrient intake, Carbohydrate	Average (mean) carbohydrate intake as a proportion (%) of daily energy intake among study completers (Continuous)	Participant characteristic	No	Yes	Ahn 2010	‐	Baseline levels of macronutrient intake have the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Beasley 2009, Monteleone 2003, Yeomans 2001, Rolls 1988	N/A
Behavioural characteristics: habitual dietary macronutrient intake, Protein	Average (mean) protein intake as a proportion (%) of daily energy intake among study completers (Continuous)	Participant characteristic	No	Yes	Ahn 2010	‐	Baseline levels of macronutrient intake have the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Beasley 2009, Rolls 1988	N/A
Behavioural characteristics: habitual dietary macronutrient intake, Fat	Average (mean) fat intake as a proportion (%) of daily energy intake among study completers (Continuous)	Participant characteristic	No	Yes	Ahn 2010	‐	Baseline levels of macronutrient intake have the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Beasley 2009, Brennan 2012, Monteleone 2003, Yeomans 2001, Rolls 1988	N/A
Behavioural characteristics: physical activity	Average (mean) daily total number of steps among study completers (Continuous)	Participant characteristic	No	Yes	Ahn 2010	‐	Baseline levels of physical activity have the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Martins 2007	N/A
Behavioural characteristics: habitual energy expenditure	Average (mean) daily energy expenditure among study completers in kcal (Continuous)	Participant characteristic	No	Yes	Rolls 2006a	Rolls 2006b, Rolls 2007a, Rolls 2007b (S1), Rolls 2007b (S2), Rolls 2007b (S3), Rolls 2010a (E1), Rolls 2010b (E2)	Baseline levels of energy expenditure have the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Martins 2007	N/A
Behavioural characteristics: habitual physical exercise	Average (mean) number of hours of physical exercise completed among study completers per week (Continuous)	Participant characteristic	No	Yes	Marchiori 2012c	‐	Baseline levels of physical exercise have the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Martins 2007	N/A
Biological state: hunger	Average (mean) hunger rating among study completers – 100 mm visual analogue scale (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Biological state: hunger	Average (mean) hunger rating among study completers ‐ 3‐point rating scale (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Biological state: hunger	Average (mean) hunger rating among study completers ‐ 7‐point rating scale (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Biological state: appetitive state, Fullness	Average (mean) fullness rating among study completers – 100 mm visual analogue scale (Continuous)	Participant characteristic	No	Yes	Shah 2011	‐	Baseline levels of feelings of fullness (specific somatic sensation or perceived general state of fullness (Blundell 2010)) have the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Doucet 2008	N/A
Biological state: appetitive state, Satiety	Average (mean) satiety rating among study completers – 100 mm visual analogue scale (Continuous)	Participant characteristic	No	Yes	Shah 2011	‐	Baseline levels of feelings of satiety (specific somatic sensation or perceived general state of being satiated (Blundell 2010)) have the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Lemmens 2011	N/A
Biological state: appetitive state, Prospective consumption	Average (mean) prospective consumption rating among study completers – 100 mm visual analogue scale (Continuous)	Participant characteristic	No	Yes	Shah 2011	‐	Baseline levels of prospective consumption (how much participants felt they could eat now (Shah 2011)) have the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Doucet 2008	N/A
Other clinical characteristics: depression	Average (mean) depression score among study completers ‐ Zung Depression Inventory (Zung 1986) (Continuous)	Participant characteristic	No	Yes	Rolls 2002	Rolls 2004a, Rolls 2004b, Rolls 2007a	Baseline feelings of depression (or of affective, psychological or somatic symptoms associated with depression) have the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption	Grossniklaus 2010	N/A
Socioeconomic status: occupational status	Proportion (%) of study completers in employment (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Socioeconomic status: occupational status	Proportion (%) of study completers with a parent or caregiver in employment (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Socioeconomic status: education	Average (mean) number of years of education completed among study completers (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Socioeconomic status: education	Proportion (%) of study completers who completed at least some further education (greater than high school, at least some college) (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Socioeconomic status: education	Proportion (%) of study completers with a parent or caregiver who completed at least some further education (greater than high school, at least some college) (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Socioeconomic status: education	Proportion (%) of study completers with a parent or caregiver who completed at least a 4‐year university degree (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Socioeconomic status: income	Proportion (%) of study completers with an individual income > USD 50,000 (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Socioeconomic status: income	Proportion (%) of study completers with a total family income > USD 50,000 (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Other measures of socioeconomic status: food insecurity	Proportion (%) of study completers living in a food insecure household (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Other measures of socioeconomic status: welfare receipt	Proportion (%) of study completers participating in the US National School Lunch Program (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Other measures of socioeconomic status: welfare receipt	Proportion (%) of study completers participating in the US School Nutrition Assistance Program (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Overall (summary) risk of bias	Low risk, unclear risk or high risk (Categorical, nominal)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A

Outcome data

As anticipated, eligible primary studies frequently included more than one measure of each target outcome construct, specifically: (a) more than one measure of selection for a given comparison, (b) more than one measure of consumption for a given comparison, or both. For each included study in which (a) or (b) applied, we extracted outcome data for use in meta‐analysis for the (a) primary selection or (b) primary consumption outcome(s) as (pre‐)specified by the study authors. If the study authors did not (pre‐)specify a single (primary) (a) selection or (b) consumption outcome, we applied the following criteria to select the (a) selection or (b) consumption measure for which outcome data would be extracted for use in meta‐analysis from a list of all available measures. We selected the measure of (a) selection or (b) consumption most proximal to health outcomes in the context of the specific intervention at hand. For example, if a study reported measures of both energy intake and the amount of food eaten (in grams), we selected energy intake as the measure of the target outcome construct most proximal to diet‐related health outcomes. We also selected the largest‐scale measure of the target outcome construct. For example, if a study manipulated the size of a portion of vegetable served as one component of a plated entrée, and measured the effects of a large versus a small vegetable portion size in terms of: (i) the amount of that vegetable consumed from the plated entrée, and (ii) the total amount of food consumed from the plated entrée, then we selected (ii) as the consumption outcome measure for which we extracted data. We made each selection in advance of data extraction, blinded to the outcome data. We recorded details of selection and consumption outcomes measures available in each included study and documented these in Characteristics of included studies.

For included studies that investigated a size manipulation, we always coded exposure to the larger of the two portions, packages, individual units or items of tableware as the intervention, whilst we always coded exposure to the smaller of the two as the comparator. For included studies that investigated a shape manipulation, we always coded exposure to the shorter, wider of the two items of tableware as the intervention, whilst we always coded exposure to the taller, narrower of the two as the comparator.

For all outcome data we collected information on: outcome variable type (in practice, this was invariably continuous); outcome variable definition; unit of measurement (natural units); specific metric (final values, change from baseline); method of aggregation (mean); timing of measurement (immediate (that is, ≤ 1 day) or longer‐term (that is, > 1 day)); and type of measure (objective, self report). For continuous outcomes, we extracted mean differences, or mean changes in final measurements from baseline measurements, for each comparison group along with associated standard deviations (or, if standard deviations were missing, standard errors, 95% confidence intervals or relevant t‐statistics, f‐statistics or exact P values that we used to calculate standard deviations); we also indicated whether a high or low value is favourable from a public health perspective. For included studies with factorial designs, we combined comparison groups so that any independent or interactive effects of the co‐occurring manipulation were averaged across the comparison groups of interest, in order to allow investigation of the independent effects of the size or shape manipulation.

Assessment of risk of bias in included studies

We assessed risk of bias in the included studies using the Cochrane 'Risk of bias' tool addressing eight specific domains, namely: random sequence generation and allocation concealment (selection bias); blinding of participants and personnel (performance bias); blinding of outcome assessors (detection bias); incomplete outcome data (attrition bias); selective outcome reporting (reporting bias); and baseline comparability of participant characteristics between groups and consistency in intervention delivery (other bias) (Higgins 2011b). The last domain refers to whether information and specific instructions provided to participants were standardised between conditions and whether participant (non‐)compliance with the study protocol was appropriately managed.

Two researchers working independently (GJH, IS) applied the Cochrane 'Risk of bias' tool to each included study. We recorded supporting information for judgements of risk of bias (high, low or unclear) in the form of verbatim text extracted from study reports, supplemented with reviewer comments. We identified and resolved discrepancies between the two researchers' judgements or supporting information by discussion to reach consensus. We derived a summary risk of bias judgement (high, low or unclear) for each specific outcome, for inclusion as a study‐level covariate in the final stage of the meta‐regression analysis (see Data synthesis). We also considered summary risk of bias in determining the strength of inferences drawn from the results of the data synthesis and in developing conclusions and recommendations concerning the design and conduct of future research. We derived the summary risk of bias judgement from the four domains judged to be most critical in this specific review, namely: random sequence generation (selection bias); allocation concealment (selection bias); blinding of participants and personnel (performance bias); and baseline comparability of participant characteristics between groups (other bias). It was derived using an algorithm suggested in Section 8.7 (Table 8.7a) of the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011b). Specifically, if the judgement in at least one of these four domains was 'high risk of bias' then we determined summary risk of bias to be high. If no judgements of 'high' risk were made in these four domains, but the judgement in at least one of these domains was 'unclear risk of bias' then we determined the summary risk of bias to be unclear. We only judged summary risk of bias 'low' if judgements in all four of these domains were 'low risk of bias'.

Measures of treatment effect

We calculated the standardised mean difference (SMD) with 95% confidence intervals to express the size of the intervention effect in each study relative to the variability observed in that study. We classified included study results according to two categories of timing of outcome measurement: immediate outcomes (that is ≤ 1 day) versus longer‐term outcomes (that is > 1 day).

Unit of analysis issues

In the case of cluster‐randomised controlled trials, where an analysis was reported that accounted for the clustered study design, we estimated the effect on this basis. Where this was not possible and the information was not available from the authors, then we carried out an 'approximately correct' analysis according to current guidelines (Higgins 2011a). We imputed estimates of the intra‐cluster correlation (ICC) using estimates derived from similar studies included in the review. We also computed inflated standard errors for outcome data from cluster‐randomised controlled trials based on reported test statistics (f values, t values or P values) and used these data in all statistical analyses. Where test statistics were not available, we imputed inflated standard errors from unadjusted standard errors based on ratios of adjusted to unadjusted standard errors obtained from similar studies included in the review.

For included studies with a within‐subjects design, we calculated the standardised mean difference for continuous outcomes using the methods described in Section 16.4 of the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011a). Similar to our approach for cluster‐randomised controlled trials, we sought to compute deflated standard errors for outcome data from studies with a within‐subjects design based on reported test statistics, or on ratios of inflated to unadjusted standard errors obtained from similar studies included in the review. However, in studies with a within‐subjects design, these ratios exceeded one, which is counter‐intuitive and suggests there was no statistical advantage in using within‐subjects designs in this area. We therefore reverted to use of unadjusted standard errors for studies with a within‐subjects design in all statistical analyses.

Final outcome values served as the primary unit of analysis. Only one included study reported outcome data using changes from baseline as the metric (Ahn 2010). For this study we computed final values based on reported data, supplemented with additional information supplied by the authors.

Dealing with missing data

Where data were missing due to participant dropout we conducted available case analyses and recorded any issues of missing data within the assessments conducted using the Cochrane 'Risk of bias' tool.

Assessment of heterogeneity

We assessed statistical heterogeneity in results by inspection of a graphical display of the estimated treatment effects from included studies along with their 95% confidence intervals, and by formal statistical tests of homogeneity (Chi²) and measures of inconsistency (I²) and heterogeneity (τ²).

Assessment of reporting biases

We drew funnel plots (plots of effect estimates versus the inverse of their standard errors) to inform assessment of reporting biases. We conducted statistical tests to formally investigate the degree of asymmetry using the method proposed by Egger et al (Egger 1997). We interpreted the results of statistical tests based on visual inspection of the funnel plots. Asymmetry of the funnel plot may indicate publication bias or other biases related to sample size, though it may also represent a true relationship between trial size and effect size.

Data synthesis

We described and summarised the findings of included studies to address the two stated objectives of the review. We provide a narrative synthesis describing the interventions, participants, study characteristics and effects of eligible interventions upon pre‐specified outcomes (see Criteria for considering studies for this review).

Our statistical analysis of the results of included studies used a series of random‐effects and fixed‐effect models to estimate summary effect sizes as SMDs with 95% confidence intervals. We determined the final configuration of our statistical analysis based on the final version conceptual model (Figure 1). We conducted the statistical analysis using STATA (StataCorp, College Station, TX, 2014) and it comprised the following stages:

Stage 1. A standard meta‐analysis to estimate summary effect sizes for all eligible interventions versus all comparators, using metan (Harris 2008).

Stage 2. A meta‐regression analysis with type of product (food, alcohol, tobacco) as a covariate.

Stage 3. A meta‐regression analysis with study characteristics as additional covariates.

Stage 4. A meta‐regression analysis with intervention characteristics as covariates. At the protocol stage, we considered the option of conducting multivariate analysis to deal with studies with multiple treatment arms in order for direct comparisons between each treatment arm and a control condition to be modelled, using mvmeta (White 2011). In practice, we did not judge this appropriate and we conducted all meta‐regression analyses using metareg (Harbord 2008).

Stage 5. A meta‐regression analysis with participant characteristics and 'Risk of bias' assessment as covariates.

We only incorporated outcome data from independent comparisons into the statistical analysis. For example, from an included study that measured energy consumed from a lunch meal in four groups of participants served with a 275 g, a 367 g, a 458 g or a 550 g sandwich (Rolls 2004a), available pairwise comparisons are: 275 g versus 367 g, 275 g versus 458 g, 275 g versus 550 g, 367 g versus 458 g, 367 g versus 550 g, and 458 g versus 550 g. However, since these comparisons are not independent from one another, only the incremental comparisons (which are independent) were incorporated: 275 g versus 367 g, 367 g versus 458 g, and 458 g versus 550 g. Our decision to incorporate only outcome data from incremental comparisons into the statistical analysis effectively assumes a linear 'dose‐response' relationship between portion size and consumption/selection for portions of the sizes investigated in included studies. This assumption was judged reasonable by topic expert members of the review team and it is also conservative in terms of its impact on estimates of summary effect sizes. Some groups of study participants feature in two incremental comparisons (e.g. the 367 g group features in both the 275 g versus 367 g comparison and the 367 g versus 458 g comparison), therefore we halved sample sizes for groups featuring in two incremental comparisons to adjust their weighting in the analysis for this non‐independence.

Preliminary examination of outcome data revealed substantive variation in effect sizes between comparisons identified from studies that manipulated portion, package, individual unit or tableware size and those identified from studies that manipulated tableware shape. We did not judge comparisons of size conceptually comparable to comparisons of shape among the set of studies included in this review: size comparisons consisted in larger versus smaller sizes (of a portion, package, individual unit or item of tableware), whilst shape comparisons consisted in shorter, wider versus taller, narrower glasses or bottles (tableware). We therefore took the post‐hoc decision to conduct separate meta‐analyses for size and shape respectively, for both consumption and selection outcomes. (This decision effectively removed the covariate that differentiated between size and shape manipulations from subsequent meta‐regression analyses ‐ see below and Table 1). Preliminary analyses also revealed substantive variation in effect sizes between those measured in children and those measured in adults (as well as variation in effect sizes between adults of different ages), and between comparisons involving food products and those involving tobacco products. We therefore estimated supplementary summary effect sizes for these subgroups to illustrate these variations in effects. In describing the effects of size and shape interventions on selection and consumption, our narrative synthesis is disaggregated as appropriate to reflect these variations and to incorporate supplementary effect sizes estimated to illustrate them (see Effects of interventions).

We used the following procedures for meta‐regression analyses. First, for each of the two outcomes (consumption and selection), we conducted a series of univariable analyses using random‐effects models to test for a statistical association between each covariate and the study‐level effect size (SMD). All variables identified in the final version of the conceptual model (see Table 1) were candidate covariates for univariable analyses. Blinded to data extracted for covariates from study reports by two researchers (GJH, IS), topic experts within the review team selected six baseline participant characteristics to be prioritised when contacting study authors to request data on potential effect modifiers that appeared to have been measured but were missing from study reports. This selection was based on what were expected to be the most important modifiers of the effects of the intervention, primarily based on topic experts' knowledge of theory and evidence for determinants of between‐person variation in levels of food and energy intake (since the majority of studies included in this review focused on food ‐ see Description of studies). The six selected covariates (variable type) were: age (continuous), gender (categorical), BMI (continuous), dietary restraint (continuous), dietary disinhibition (continuous) and hunger (continuous). All six had been pre‐specified in the original version of the conceptual model (Figure 1) and had been measured at baseline in at least one included study. We decided in advance of conducting univariable meta‐regression analyses that candidate covariates would be excluded if they had been measured in fewer than 10 independent comparisons feeding into an analysis (insufficient data) or if there was no variation in the value of the covariate between independent comparisons feeding into an analysis (absence of variation, which precluded estimation). Based on these exclusion criteria, we conducted two series of univariable meta‐regression analyses to investigate potential modifiers of the effects of larger versus smaller portions, packages, individual units or tableware on: (a) consumption of food and tobacco; and (b) on the selection (without purchase) of food. We did not conduct other planned series of univariable meta‐regression analyses due to insufficient data following application of the exclusion criteria outlined above.

Second, we estimated random‐effects models to identify the collections of study‐level covariates that best explained the between‐studies component of the variance in study‐level estimates of effect size. As with univariable analyses, it proved possible in practice to implement this analysis to investigate potential modifiers of the effects of larger versus smaller portions, packages, individual units or tableware on: (a) consumption of food and tobacco; and (b) on the selection (without purchase) of food. We did not conduct other planned second stage analyses due to insufficient data. We selected variables for inclusion in models using a stepwise forward selection procedure. We selected first the covariate which had the largest value of R² (a measure of the proportion of the between‐studies component of the variance explained by the model) based on the results of the preceding series of univariable analyses. Next, we added each of the other covariates observed to be statistically associated with the study‐level effect size in the results of the preceding univariable analyses to the model in sequence (in an order corresponding to Stages 2 to 4 of the statistical analysis plan, outlined above in this section). Each covariate was retained in the final model if its incorporation contributed to an increase in the value of the R² but was otherwise dropped from the model. Consequently, once this procedure was completed, the final model specification maximised the value of R².

To facilitate interpretation of estimated effect sizes (Schünemann 2011), we re‐expressed a series of SMD values ranging between 0.1 and 2.5 in terms of selected metrics of food or tobacco selection/consumption. Baseline values (SMD = 0.0) reflect estimated average (mean) consumption levels among representative samples of UK adults or children and associated among‐participant variation (that is, the standard deviation). Two researchers (IS and HBL) estimated average (mean) food energy intake, non‐alcoholic beverage consumption and cigarette consumption (among smokers) using unweighted data from the UK National Diet and Nutrition Survey Years 1‐4, collected using 24‐hour dietary recall in a nationally representative UK population sample (National Centre for Social Research 2012). One researcher (IS) also estimated an alternative estimate of average cigarette consumption (among smokers) based on unweighted data from the UK Opinions and Lifestyle Survey 2012 (Office for National Statistics 2012). We used these data to re‐express SMD values in terms of the proportionate (%) and absolute changes from baseline values in terms of each selected metric and tabulated these data for illustrative purposes (see Effects of interventions). We also compared re‐expressed values among UK adults and children to those based on published estimates among equivalent US samples.

'Summary of findings' table

We used the standard GRADE system to rate the quality of the respective bodies of evidence for (1) consumption and (2) selection (with or without purchasing) outcomes in terms of the extent of our confidence in (summary) estimates of effects. GRADE criteria for assessing quality of evidence encompass study limitations, inconsistency, imprecision, indirectness, publication bias and other considerations. We recorded the justifications underpinning these assessments. We present this information in a series of 'Summary of findings' tables developed using GRADEpro GDT (Brozek 2008), alongside a summary of the estimated intervention effect and details of the numbers of studies (independent comparisons) and participants that underpinned each estimate. Our decision to present a series of 'Summary of findings' tables rather than a single table reflects our decisions to conduct separate meta‐analyses for size and shape respectively (for both consumption and selection outcomes) and to present separate summary effect sizes for food products and tobacco products (see above in this section ‐ in both cases preliminary examination of outcome data had identified substantial variation in effect sizes between studies with these variant characteristics). Separate 'Summary of findings' tables are therefore presented to summarise evidence for the (differential) effects of exposure to larger‐sized portions, packages and tableware (by product ‐ food and tobacco) and exposure to differently shaped tableware (by product ‐ food only). Within each 'Summary of findings' table, findings are grouped by outcome (consumption and selection). In addition to presenting the overall summary effect size for each outcome, we also present disaggregated summary effect sizes for subgroups of studies involving children and adults respectively (again, due to identified variation in effect sizes between those measured in children and those measured in adults ‐ see above in this section).

Sensitivity analysis

We conducted a sensitivity analysis to explore the impact of outcome data imputed due to missing data. In practice, standard deviations were the only component of outcome data that needed to be imputed for some independent comparisons due to missing data. Therefore, this sensitivity analysis in practice involved re‐estimating fixed‐effect and random‐effects meta‐analyses (for both selection and consumption outcomes – all comparisons) using imputed values for standard deviations that were (1) double and (2) half those used in the 'base case' analyses reported in the Effects of interventions section. At the protocol stage, we had also planned to conduct a sensitivity analysis to explore the separate analysis of studies of food and tobacco products. In practice, we estimated supplementary summary effect sizes for these subgroups of studies (see Data synthesis), which was functionally equivalent to this planned sensitivity analysis.

Results

Description of studies

Results of the search

The flow of studies through the systematic review process is shown in Figure 2. Electronic database searches retrieved a total of 76,279 study records, including duplicates. Searches of other resources identified 23 additional study records not retrieved by electronic database searches, comprising 15 records identified by searching reference lists of eligible study reports or forward citation tracking and eight records identified within our preceding, broader scoping review (Hollands 2013a). Automatic and manual de‐duplication identified 24,624 duplicate records, which we discarded. Therefore, 51,655 unique records entered title/abstract screening. Of these, we excluded 51,472 records and obtained corresponding full‐text study reports for the remaining 183 records assessed as potentially eligible.

Figure 2

PRISMA study flow diagram.

We excluded 101 study reports based on full‐text screening. Primary reasons for exclusion are summarised in Figure 2 (PRISMA flow diagram) and in the Characteristics of excluded studies table. A further four full‐text study reports were conference abstracts with insufficient information to enable confident assessment of eligibility (Loney 2010, Martinez 2010, Schmidt 2013, Skov 2013). Brief details of these four studies are provided in Characteristics of studies awaiting classification tables. Therefore, following exclusions, identification and linking of multiple eligible study reports of the same study and identification of study reports comprising multiple eligible studies, we have identified a total of 83 studies as meeting the eligibility criteria for this review (from 78 full‐text study reports). The number of included studies exceeds the number of included study reports due to the comparative incidences of study reports that report multiple studies (i.e. two or more studies reported in the same publication) and studies reported in single or multiple study reports among studies/reports that we identified as meeting eligibility criteria for this review.

Eligible studies included in the review

Seventy‐two of the 83 eligible studies (66 study reports) were identified by the original search initiated in November 2012 (see Search methods for identification of studies). These 72 studies, published between 1978 and July 2013, are described in the Included studies section below (with further details of each study provided in Characteristics of included studies tables) and are recorded as 'studies included in the review' in Figure 2. All remaining sub‐sections of the Results section of the current version of this review (i.e. Included studies, Excluded studies, Risk of bias in included studies and Effects of interventions), as well as its Discussion and Authors' conclusions sections, are based exclusively on evidence collected from these 72 included studies. We sought to establish contact with authors of 36 of 72 included studies to request data missing from study reports (Argo 2012 (S5); Burger 2011; Cavanagh 2013; Coelho do Vale 2008 (S2); DiSantis 2013; Fisher 2013; Flood 2006; Goldstein 2006; Jeffery 2007; Kral 2004a; Kral 2010; Levitsky 2004; Marchiori 2012a; Marchiori 2012c; Mishra 2012 (S1); Mishra 2012 (S2); Rolls 2000; Rolls 2002; Rolls 2004a; Rolls 2004b; Rolls 2006a; Rolls 2007b (S1); Rolls 2007b (S3); Rolls 2010a (E1); Rolls 2010b (E2); Russell 1980; Scott 2008b (S2); Scott 2008c (S3); Scott 2008d (S4); Spill 2010; Spill 2011b; Wansink 1996a (S1); Wansink 2001; Wansink 2003 (S1); Wansink 2003 (S2); Wansink 2011a (S4)). We were able to establish contact with authors of 32 of these 36 studies (Burger 2011; Cavanagh 2013; Coelho do Vale 2008 (S2); DiSantis 2013; Fisher 2013; Flood 2006; Jeffery 2007; Kral 2004a; Kral 2010; Levitsky 2004; Marchiori 2012a; Marchiori 2012c; Rolls 2000; Rolls 2002; Rolls 2004a; Rolls 2004b; Rolls 2006a; Rolls 2007b (S1); Rolls 2007b (S3); Rolls 2010a (E1); Rolls 2010b (E2); Russell 1980; Scott 2008b (S2); Scott 2008c (S3); Scott 2008d (S4); Spill 2010; Spill 2011b; Wansink 1996a (S1); Wansink 2001; Wansink 2003 (S1); Wansink 2003 (S2); Wansink 2011a (S4)), of which 20 supplied the requested information (Burger 2011; Cavanagh 2013; Coelho do Vale 2008 (S2); DiSantis 2013; Flood 2006; Kral 2010; Levitsky 2004; Marchiori 2012a; Marchiori 2012c; Rolls 2000; Rolls 2002; Rolls 2004a; Rolls 2004b; Rolls 2006a; Rolls 2007b (S1); Rolls 2007b (S3); Rolls 2010a (E1); Rolls 2010b (E2); Spill 2010; Spill 2011b). Including data supplied by study authors, 70 of 72 included studies provided useable data for meta‐analyses (104 independent comparisons) ‐ the exceptions were the studies by Argo 2012 (S5) and Goldstein 2006.

Eligible studies accepted into the review and awaiting full integration

The other 11 of the 83 eligible studies (12 study reports) were identified by the updated search (30 January 2015) (Bajaj 2014; Haire 2014; Kral 2014; Marchiori 2014; Rolls 2014a; Smith 2013a; van Ittersum 2013; van Kleef 2014; Wansink 2013; Wansink 2014; Williams 2014). These 11 studies, published during 2013 and 2014, are described in Characteristics of studies awaiting classification tables and are recorded as 'studies accepted into the review and awaiting full integration' in Figure 2. As well as describing key characteristics of each of these 11 further eligible studies, the Characteristics of studies awaiting classification tables also include provisional study‐level effect sizes (SMDs and 95% CIs) computed based on useable data provisionally extracted from 12 corresponding study reports.

It was important to establish whether the full integration of these 11 eligible studies could change the interpretation of the results of this review, and hence its conclusions, as reported below in Results, Discussion and Authors' conclusions. We therefore conducted preliminary analyses to investigate this issue using outcome data that could provisionally be extracted from each of the 11 further eligible studies. These preliminary analyses are summarised in Appendix 2. Their results establish that there is minimal potential for full integration of these 11 studies to change the interpretation of the results of this review, and hence its conclusions, as reported below in Results, Discussion and Authors' conclusions. On this basis we took the pragmatic decision (in consultation with the Cochrane Public Health Review Group) to defer full integration of these 11 studies until the first major update of this review. Therefore, as highlighted above, all results and findings presented in the remainder of the main text of this review are based exclusively on evidence collected from the 72 included studies identified by the original search up to and including 20 November 2012.

Included studies

The majority of the 72 included studies were conducted in the USA (58 of 72), with five studies from Canada (Argo 2012 (S1); Argo 2012 (S2); Argo 2012 (S4); Argo 2012 (S5); Koh 2009), three from Belgium (Marchiori 2011; Marchiori 2012a; Marchiori 2012c), two from the Netherlands (Coelho do Vale 2008 (S2); Hermans 2012), two from the UK (Kelly 2009; Russell 1980), and one study each from Australia (Cavanagh 2013) and South Korea (Ahn 2010). We identified no eligible studies conducted in low‐ or middle‐income countries (LMICs). The majority of included studies were conducted in laboratory settings (50 of 72) and the others (22 of 72) were conducted in field settings ‐ predominantly restaurants or school or workplace cafeterias (Ahn 2010; Diliberti 2004; DiSantis 2013; Ebbeling 2007; Huss 2013; Jeffery 2007; Leahy 2008; Looney 2011; Marchiori 2012c; Mishra 2012 (S1); Raynor 2007; Raynor 2009; Russell 1980; Spill 2010; Spill 2011b; Stroebele 2009; Wansink 2001; Wansink 2003 (S1); Wansink 2003 (S2); Wansink 2005b; Wansink 2006; Wansink 2011b).

Study participants were adults (16 years or more) in 55 of 72 studies (predominantly younger adults aged 19 to 30 years), children in 16 studies (predominantly younger children aged three to six years) (DiSantis 2013; Ebbeling 2007; Fisher 2003; Fisher 2007b; Fisher 2007c; Fisher 2013; Huss 2013; Kral 2010; Leahy 2008; Looney 2011; Marchiori 2012c; Mathias 2012; Rolls 2000; Spill 2010; Spill 2011b; Wansink 2003 (S1)), and both adults and children in one study (Fisher 2007a). In the median study, participants' mean age was 22.2 years (Rolls 2002), ranging between 2.6 years (Fisher 2007c) and 55.2 years (Ahn 2010). Data on the sex of participants was available in 65 of 72 studies. The median study included 55% female participants, ranging from 0% to 100% female (interquartile range (IQR): 49 to 84). Seventy of 72 studies were conducted in low deprivation contexts, whilst the other two were conducted in high deprivation contexts (DiSantis 2013; Fisher 2007a).

In the median studies, participants' mean body mass indexes (BMIs) were 23.5 (Flood 2006; Raynor 2007) and, across all included studies, mean BMI ranged between 17.0 (Kral 2010) and 34.0 (Fisher 2007a). Mean dietary restraint score (Stunkard 1985) in the median studies was 5.8 (Flood 2006, Rolls 2006a), with a range of 4.3 (Raynor 2007) to 9.8 (Burger 2011), while mean dietary disinhibition score (Stunkard 1985) in the median studies was 4.3 (Rolls 2007b (S1); Rolls 2007b (S2)), with a range of 3.5 (Rolls 2002) to 5.3 (Burger 2011; Kral 2004a). Mean baseline hunger score (Stunkard 1985) in the median study was 4.5 (Flood 2006), with a range of 3.6 (Rolls 2007a) to 5.6 (Rolls 2004b). These results suggest that included studies examined effects in participants who were mainly unrestrained eaters (Stunkard 1985).

Sixty‐nine of 72 studies involved manipulations of food products, with the other three focused on tobacco (Jarvik 1978 (E1); Jarvik 1978 (E2); Russell 1980). No eligible studies of alcohol products were identified. The target of manipulation was the portion size in 35 of 72 studies (Burger 2011; Cavanagh 2013; Diliberti 2004; Fisher 2003; Fisher 2007a; Fisher 2007b; Fisher 2007c; Flood 2006; Goldstein 2006; Hermans 2012; Huss 2013; Jeffery 2007; Kelly 2009; Kral 2004a; Kral 2010; Leahy 2008; Levitsky 2004; Looney 2011; Mathias 2012; Rolls 2000; Rolls 2002; Rolls 2004a; Rolls 2004b; Rolls 2006a; Rolls 2006b; Rolls 2007a; Rolls 2010a (E1); Rolls 2010b (E2); Spill 2010; Spill 2011b; van Kleef 2013; Wansink 1996b (S2); Wansink 1996c (S4); Wansink 2001; Wansink 2005b). In 10 studies the target of manipulation was the package size (Argo 2012 (S1); Argo 2012 (S2); Argo 2012 (S4); Argo 2012 (S5); Coelho do Vale 2008 (S2); Ebbeling 2007; Raynor 2009; Stroebele 2009; Wansink 1996a (S1); Wansink 2011a (S4)), in six studies it was the size of individual units of a product (including in the three included tobacco studies, which all manipulated the length of cigarettes) (Devitt 2004; Jarvik 1978 (E1); Jarvik 1978 (E2); Marchiori 2011; Marchiori 2012c; Russell 1980), and in 15 studies it was the size or shape of tableware (Ahn 2010; DiSantis 2013; Koh 2009; Mishra 2012 (S1); Mishra 2012 (S2); Rolls 2007b (S1); Rolls 2007b (S2); Rolls 2007b (S3); Shah 2011; van Kleef 2012; Wansink 2003 (S1); Wansink 2003 (S2); Wansink 2005d; Wansink 2006; Wansink 2011b). One study incorporated separate manipulations of both portion size and tableware size (Fisher 2013), and two studies incorporated separate manipulations of both portion size and package size (Marchiori 2012a; Raynor 2007). Three studies incorporated concurrent manipulations of package size and individual unit size, applied simultaneously and were therefore inherently confounded (Scott 2008b (S2); Scott 2008c (S3); Scott 2008d (S4)).

Sixty‐nine of 72 studies manipulated size, whilst the other three manipulated shape (Wansink 2003 (S1); Wansink 2003 (S2); Wansink 2005d). Among studies that manipulated size, the larger of the two compared portions, packages, individual units or items of tableware was, on average (median) 167% (IQR: 140 to 200) of the size of the smaller version, and the mode was 200%. The larger of the two compared portions, packages, individual units or items of tableware was ≈200% of the size of the smaller version in one‐third of included food studies (independent comparisons) and fell between 120% and 159% in half of the included food studies, indicating a bimodal distribution. Absolute sizes investigated in included food studies also tended to be large compared with reference portion sizes (defined here as the size that is recommended to be consumed, or that is customarily consumed, in a single eating occasion, by one or more schemes for communicating portion size messages to consumers (Lewis 2012)) derived from a published report on typical portion sizes in the UK in 2002 (Food Standards Agency 2002). For example, the pairs of portion, package or individual unit sizes compared within included food studies both exceeded the reference portion size in 81% (34 of 42) of those independent comparisons for which these data were available and applicable (42 of 86), whilst only 5% (2 of 42) compared a (larger) portion that was ≈100% of the reference portion size with a (smaller) portion that was < 100% of the reference portion size (Food Standards Agency 2002). Reference portion sizes could not be coded for approximately half of the pairs of food product sizes compared within included studies (44 of 86) due to them manipulating tableware (for example, DiSantis 2013), or multiple products simultaneously (for example, Kelly 2009), or due to missing data.

Further details on characteristics of interventions and comparators are provided in Characteristics of included studies.

Consumption outcomes only were reported in 59 of 72 included studies (Ahn 2010; Argo 2012 (S1); Argo 2012 (S2); Argo 2012 (S4); Argo 2012 (S5); Burger 2011; Cavanagh 2013; Coelho do Vale 2008 (S2); Devitt 2004; Diliberti 2004; Ebbeling 2007; Fisher 2007a; Fisher 2007b; Fisher 2007c; Flood 2006; Goldstein 2006; Hermans 2012; Huss 2013; Jarvik 1978 (E1); Jarvik 1978 (E2); Jeffery 2007; Kelly 2009; Kral 2004a; Kral 2010; Leahy 2008; Levitsky 2004; Looney 2011; Marchiori 2011; Marchiori 2012a; Marchiori 2012c; Mathias 2012; Mishra 2012 (S1); Mishra 2012 (S2); Raynor 2007; Raynor 2009; Rolls 2000; Rolls 2002; Rolls 2004a; Rolls 2004b; Rolls 2006a; Rolls 2006b; Rolls 2007a; Rolls 2007b (S1); Rolls 2007b (S2); Rolls 2007b (S3); Rolls 2010a (E1); Rolls 2010b (E2); Russell 1980; Scott 2008b (S2); Scott 2008c (S3); Scott 2008d (S4); Shah 2011; Spill 2010; Spill 2011b; Stroebele 2009; van Kleef 2013; Wansink 2001; Wansink 2005b; Wansink 2011b). Selection outcomes only were reported in seven other studies (Wansink 1996a (S1); Wansink 1996b (S2); Wansink 1996c (S4); Wansink 2003 (S1); Wansink 2003 (S2); Wansink 2006; Wansink 2011a (S4)), whilst both selection and consumption outcomes were reported in six other studies (DiSantis 2013; Fisher 2003; Fisher 2013; Koh 2009; van Kleef 2012; Wansink 2005d). Outcomes were measured objectively rather than by participant self report in almost all included studies with two exceptions (Ahn 2010; Jeffery 2007), and were typically measured over a period of one day or less (60 of 72 studies). Those studies that measured outcomes over a period exceeding one day were Ahn 2010, Fisher 2013, Huss 2013, Jeffery 2007, Kelly 2009, Raynor 2007, Raynor 2009, Rolls 2006a, Rolls 2006b, Rolls 2007a, Russell 1980 and Stroebele 2009.

In line with the eligibility criteria, all 72 included studies were randomised controlled trials (see Types of studies). Thirty‐eight had a within‐subjects (cross‐over) design (Burger 2011; Devitt 2004; DiSantis 2013; Ebbeling 2007; Fisher 2003; Fisher 2007a; Fisher 2007b; Fisher 2007c; Fisher 2013; Flood 2006; Huss 2013; Jarvik 1978 (E1); Jarvik 1978 (E2); Jeffery 2007; Kelly 2009; Kral 2004a; Kral 2010; Leahy 2008; Levitsky 2004; Looney 2011; Mathias 2012; Rolls 2000; Rolls 2002; Rolls 2004a; Rolls 2004b; Rolls 2006a; Rolls 2006b; Rolls 2007a; Rolls 2007b (S1); Rolls 2007b (S2); Rolls 2007b (S3); Rolls 2010a (E1); Rolls 2010b (E2); Russell 1980; Shah 2011; Spill 2010; Spill 2011b; Stroebele 2009), and the remaining 34 had a between‐subjects (parallel‐group) design (Ahn 2010; Argo 2012 (S1); Argo 2012 (S2); Argo 2012 (S4); Argo 2012 (S5); Cavanagh 2013; Coelho do Vale 2008 (S2); Diliberti 2004; Goldstein 2006; Hermans 2012; Koh 2009; Marchiori 2011; Marchiori 2012a; Marchiori 2012c; Mishra 2012 (S1); Mishra 2012 (S2); Raynor 2007; Raynor 2009; Scott 2008b (S2); Scott 2008c (S3); Scott 2008d (S4); van Kleef 2012; van Kleef 2013; Wansink 1996a (S1); Wansink 1996b (S2); Wansink 1996c (S4); Wansink 2001; Wansink 2003 (S1); Wansink 2003 (S2); Wansink 2005b; Wansink 2005d; Wansink 2006; Wansink 2011b; Wansink 2011a (S4)). There was no evidence of funding of included studies by agencies that may have commercial interests in their results.

Excluded studies

We excluded 81 of 149 study reports identified by the original search from this review at the full‐text screening stage. We further excluded 20 of 34 study reports identified by the updated search at the full‐text screening stage. Details of the combined total of 101 excluded study reports (of 183 screened in full‐text) are provided in Characteristics of excluded studies, along with the primary reason for exclusion in each case (in two cases ‐ Just 2014 and Scisco 2012 ‐ the excluded study report comprised two ineligible studies (denoted as S1 and S2 in Characteristics of excluded studies tables), both excluded).

The most common reasons for exclusion were the lack of an eligible intervention, and the lack of an eligible study design. Illustrative examples of studies with no eligible intervention include Attwood 2012, in which participants were instructed to drink all of the product presented to them, rather than the quantity that they freely chose to drink. Bohnert 2011 examined the effects of using a specially designed plate (which gave visual indications of suggested portion size) versus a plain plate. There was no difference in the size or shape of the different plates, and the only difference was in its surface design, therefore there was no eligible intervention.

Illustrative examples of studies with an ineligible study design include Leidy 2010, in which participants were not randomly assigned between the two portion size conditions. The comparison was between two different experiments, as confirmed by correspondence with the senior author. Freedman 2010 again did not randomly assign participants, but instead appeared to report a study with a case series or uncontrolled longitudinal design.

Risk of bias in included studies

Following the procedures outlined in Assessment of risk of bias in included studies, we made a summary 'Risk of bias' assessment for each outcome. We classified seven studies from the 65 that measured consumption as at overall high risk of bias with respect to this outcome (Ahn 2010; Diliberti 2004; Goldstein 2006; Huss 2013; Mishra 2012 (S1); Raynor 2009; Wansink 2005d), with the remaining 58 studies classified as at overall unclear risk of bias. We classified nine of the 13 studies that measured selection (without purchase) as at overall unclear risk of bias with respect to this outcome (DiSantis 2013; Fisher 2003; Fisher 2013; Koh 2009; van Kleef 2012; Wansink 2003 (S1); Wansink 2003 (S2); Wansink 2006; Wansink 2011a (S4)), with four at high risk of bias (Wansink 1996a (S1); Wansink 1996b (S2); Wansink 1996c (S4); Wansink 2005d).

Decisions regarding individual domains within the Cochrane 'Risk of bias' tool are summarised below. Figure 3 summarises risk of bias judgements across included studies and full details of review authors' judgements and support for judgements are provided for each study in 'Risk of bias' tables in Characteristics of included studies.

Figure 3

'Risk of bias' graph: review authors' judgements about each risk of bias item presented as percentages across all eligible studies (N = 83. 'Risk of bias' assessments completed for 72 eligible studies included in the review. White spaces in the bars of this graph denote the respective proportions of the 72 included studies that did not measure (i) selection or (ii) consumption outcomes. See also Results of the search and Figure 2).

Allocation

We judged the risk of allocation bias due to the procedures used to generate a randomised sequence of assignments to be unclear in 59 of 72 studies because insufficient information was provided about these procedures to permit a judgement of low or high risk. We judged the risk of bias from this source to be low in 10 studies (Ahn 2010; Ebbeling 2007; Looney 2011; Raynor 2009, Russell 1980; Spill 2010; Wansink 1996a (S1); Wansink 1996b (S2); Wansink 1996c (S4); Wansink 2005d) and high in the remaining three studies (Goldstein 2006; Huss 2013; Mishra 2012 (S1)).

We judged risk of bias due to procedures used to conceal the allocation sequence from those involved in the enrolment and assignment of participants to be unclear in 58 studies, again due to insufficient information to permit a judgement of low or high risk. We judged risk of bias from this source to be low in five studies (DiSantis 2013; Ebbeling 2007; Huss 2013; Mathias 2012; Wansink 2011b), and high in the other nine studies (Ahn 2010; Diliberti 2004; Goldstein 2006; Mishra 2012 (S1); Raynor 2009; Wansink 1996a (S1); Wansink 1996b (S2); Wansink 1996c (S4); Wansink 2005d).

Blinding

Blinding of participants and personnel

Among the 13 studies that reported selection outcomes, we judged risk of bias to be unclear in this domain due to insufficient information in eight studies (DiSantis 2013; Fisher 2003; Fisher 2013; Wansink 2003 (S1); Wansink 2003 (S2); Wansink 2005d; Wansink 2006; Wansink 2011a (S4)), and low in the remaining five studies (Koh 2009; van Kleef 2012; Wansink 1996a (S1); Wansink 1996b (S2); Wansink 1996c (S4)).

Among the 65 studies that reported consumption outcomes, we judged risk of bias to be high in this domain in one study (Ahn 2010), low in 20 studies (Argo 2012 (S1); Argo 2012 (S2); Argo 2012 (S4); Argo 2012 (S5); Cavanagh 2013; Coelho do Vale 2008 (S2); Goldstein 2006; Hermans 2012; Koh 2009; Marchiori 2011; Marchiori 2012a; Marchiori 2012c; Raynor 2007; Raynor 2009; Scott 2008b (S2); Scott 2008c (S3); Scott 2008d (S4); van Kleef 2012; van Kleef 2013; Wansink 2011b), and unclear due to insufficient information in the remaining 44 studies.

Blinding of outcome assessment

We judged all 13 studies that reported selection outcomes to be at low risk of bias in this domain (DiSantis 2013; Fisher 2003; Fisher 2013; Koh 2009; van Kleef 2012; Wansink 1996a (S1); Wansink 1996b (S2); Wansink 1996c (S4); Wansink 2003 (S1); Wansink 2003 (S2); Wansink 2005d; Wansink 2006; Wansink 2011a (S4)).

Among the 65 studies that reported consumption outcomes, we judged the risk of bias to be high in this domain in one study (Ahn 2010). In this study, we regarded it possible that the outcome measurement may have been influenced by a lack of blinding, because participants were instructed to keep dietary records of their own intake. We judged two other studies to be at unclear risk of bias due to insufficient information (Jeffery 2007; Stroebele 2009). We judged the remaining 62 studies to be at low risk of bias.

Incomplete outcome data

Among the 13 studies that reported selection outcomes, we judged two to be at high risk of bias for this domain (Fisher 2003; Fisher 2013), with the remaining 11 studies judged to be at low risk of bias. Of the 65 studies that reported consumption outcomes, we judged eight to be at high risk of bias (Coelho do Vale 2008 (S2); Fisher 2003; Fisher 2007c; Fisher 2013; Leahy 2008; Looney 2011; Marchiori 2011; Mathias 2012), with four studies assessed as at unclear risk of bias (Mishra 2012 (S1); Mishra 2012 (S2); Rolls 2007a; Russell 1980). We judged the remaining 53 studies as at low risk of bias. We judged studies to be at high risk of bias for this domain if > 10% of participants' data had been excluded from the analysis due to low (or zero) levels of selection or consumption, or due to being outliers.

Selective reporting

We judged 67 of 72 studies to be at unclear risk of bias in this domain. This was determined by searching for record(s) containing details of the study protocol in online trial registries (ClinicalTrials.gov and the WHO International Clinical Trials Registry Platform (ICTRP)) and finding no corresponding records. As such, there was insufficient information to permit judgement of 'low risk' or 'high risk'. We assessed this domain to be at low risk of bias in four studies for which records were found and the comparison of the trial registry entries and published studies confirmed no selective outcome reporting (Ebbeling 2007; Fisher 2007b; Looney 2011; Raynor 2009). We classified one study as being at high risk of bias due to a discrepancy between the trial registry entry and the published study regarding the specified primary outcomes (Raynor 2007).

Other potential sources of bias

We assessed two additional potential sources of bias that we had pre‐specified as potentially important for this review: baseline comparability of participant characteristics between groups and consistency in intervention delivery.

Regarding baseline comparability of participant characteristics between groups, we judged 29 studies to be at low risk of bias (Ahn 2010; Burger 2011; Cavanagh 2013; Ebbeling 2007; Fisher 2003; Fisher 2007a; Fisher 2007c; Fisher 2013; Hermans 2012; Huss 2013; Jeffery 2007; Kelly 2009; Koh 2009; Kral 2010; Levitsky 2004; Looney 2011; Marchiori 2011; Marchiori 2012a; Marchiori 2012c; Raynor 2007; Raynor 2009; Rolls 2010a (E1); Rolls 2010b (E2); Russell 1980; Stroebele 2009; van Kleef 2012; van Kleef 2013; Wansink 2005b; Wansink 2011b). We assessed studies as being at low risk of bias in this domain if there were no differences in terms of baseline characteristics between comparison groups (study arms in the case of between‐subjects designs and condition orders in the case of within‐subjects designs), or where any observed differences in characteristics had been controlled for in the statistical analysis, or were judged by the review team to be unlikely to impact on key outcomes. We judged risk of bias to be high in this domain in the other 43 studies.

Regarding consistency in intervention delivery, we judged one study to be at high risk of bias because the bowl that was being manipulated was placed in a different location and at a different distance from participants in each comparison group (van Kleef 2012). We judged risk of bias unclear in this domain in 31 studies (Burger 2011; Devitt 2004; DiSantis 2013; Ebbeling 2007; Fisher 2003; Fisher 2007b; Fisher 2007c; Fisher 2013; Hermans 2012; Huss 2013; Koh 2009; Kral 2004a; Kral 2010; Levitsky 2004; Looney 2011; Mathias 2012; Mishra 2012 (S2); Raynor 2009; Rolls 2006a; Rolls 2006b; Rolls 2007a; Rolls 2007b (S1); Rolls 2007b (S2); Rolls 2007b (S3); Scott 2008b (S2); Scott 2008c (S3); Scott 2008d (S4); Shah 2011; Spill 2010; Spill 2011b; Stroebele 2009). We judged the remaining 40 studies to be at low risk of bias in this domain since information and instructions appeared to be standardised between comparison groups.

Effects of interventions

This section presents the results of our statistical analyses of outcome data collected from included studies. Results of meta‐analyses are presented as standardised mean differences (SMDs) with 95% confidence intervals (CIs). A rule of thumb for interpreting these effect sizes (SMDs) is as follows: 0.2 represents a small effect, 0.5 a moderate effect and 0.8 a large effect (Cohen 1988; Schünemann 2011).

However, it is perhaps more intuitive to interpret SMDs once they have been re‐expressed using a familiar metric (Schünemann 2011). Figure 4 is intended as an illustrative guide to help readers interpret the estimated effect sizes (SMDs) presented below in this section. Figure 4 re‐expresses a series of SMD values ranging between 0.1 and 2.5 in terms of selected measures of food or tobacco selection/consumption (for example, 'Equivalent change in average daily energy intake from food (kcal) selected or consumed' in the first column). Baseline values (SMD = 0.0) reflect estimated average (mean) consumption levels among representative samples of UK adults or children (see Data synthesis). For example, mean (standard deviation (SD)) daily energy intake from food among UK adults is estimated to be 1727 (± 537) kcal (National Centre for Social Research 2012). Each column of Figure 4 re‐expresses SMD values in terms of proportionate (%) and absolute changes from baseline values (reflecting observed among‐participant variation in consumption‐levels within each corresponding UK sample). For example, a SMD of 0.4 can be re‐expressed as equivalent to a 12.4% (215 kcal) increase in average daily energy intake from food, or a 27.2% (67 g) increase in the average single‐serve quantity of energy‐containing non‐alcoholic beverage, or a three to four cigarette increase in the average daily number of cigarettes, selected or consumed by UK adults.

Figure 4

Effect sizes re‐expressed using familiar metrics

It is important to use Figure 4 judiciously. First, end users of this review should consider the extent to which average (mean) baseline values and SDs reflect consumption patterns in their own country or region. For example, at 1727 (± 537) kcal, estimated mean (SD) daily energy intake from food among UK adults is slightly lower than among US adults with a smaller standard deviation (1834 ± 1013 kcal ‐ Drewnowski 2013). As such, if SMDs were re‐expressed based on data for US adults, proportionate (%) and absolute changes from baseline values would be larger than among UK adults (that is, a SMD of 0.4 would be re‐expressed as equivalent to a 22.1% (405 kcal) increase in average daily energy intake from food among US adults). Likewise, at 459 ± 370 g, estimated mean (SD) daily consumption of energy‐containing non‐alcoholic beverages among UK children is lower than daily sugar‐sweetened beverage (SSB) consumption among US children, with a smaller standard deviation (551 ± 1257 g ‐ Wang 2009). As such, if SMDs were re‐expressed based on US children's data, proportionate (%) and absolute changes from baseline values would again be larger than among UK children (that is, a SMD of 0.2 would be re‐expressed as equivalent to a 45.7% (251 g) increase in average daily SSB consumption among US children). Moreover, the inclusion of Figure 4 for illustrative purposes does not restrict the applicability of the results of this review to the UK population, nor is it intended to generalise the results to the UK population.

Second, none of the metrics shown in Figure 4 were actually measured as outcomes in the studies that were incorporated into meta‐analyses presented in this section (and we are not aware of any representative observational studies that include estimates of among‐participant variation in any of the specific measures of consumption/selection that were actually used to assess outcomes in these studies). Re‐expressing SMDs estimated using meta‐analyses as equivalent changes in other metrics therefore makes an implicit assumption that our estimates of effect size are directly transferable to these other metrics. For example, it assumes that the estimated size of the effect of (larger) size on consumption of food ‐ typically measured in included studies of food products as the quantity of food or energy consumed from a single meal (or single course within a meal) ‐ would produce the same size of effect on a person's energy intake over the course of a whole day. It is therefore important to recognise that, whilst Figure 4 offers illustrations to help guide interpretation of effect sizes estimated using meta‐analyses, it also extrapolates beyond the scope of the outcome data and source studies incorporated into those analyses.

1. Consumption

Ninety‐seven comparisons identified from 64 eligible studies assessed the effect of exposure to different sizes or shapes of portions, packages, individual units or tableware on consumption of food or tobacco by exposed participants.

1.1 Effect of larger size on consumption

We conducted a meta‐analysis to investigate the effect of exposure to larger size on unregulated consumption. Based on characteristics of the studies it incorporated, this meta‐analysis effectively investigated the effect of exposure to larger portions, packages, individual units or tableware on participants' unregulated consumption of food or tobacco. Usable outcome data were available for 92 independent comparisons, involving 6711 participants, identified from 61 eligible food or tobacco studies (Ahn 2010; Argo 2012 (S1); Argo 2012 (S2); Argo 2012 (S4); Burger 2011; Cavanagh 2013; Coelho do Vale 2008 (S2); Devitt 2004; Diliberti 2004; DiSantis 2013; Ebbeling 2007; Fisher 2003; Fisher 2007a; Fisher 2007b; Fisher 2007c; Flood 2006; Hermans 2012; Huss 2013; Jarvik 1978 (E1); Jarvik 1978 (E2); Jeffery 2007; Kelly 2009; Koh 2009; Kral 2004a; Kral 2010; Leahy 2008; Levitsky 2004; Looney 2011; Marchiori 2011; Marchiori 2012a; Marchiori 2012c; Mathias 2012; Mishra 2012 (S1); Mishra 2012 (S2); Raynor 2007; Raynor 2009; Rolls 2000; Rolls 2002; Rolls 2004a; Rolls 2004b; Rolls 2006a; Rolls 2006b; Rolls 2007a; Rolls 2007b (S1); Rolls 2007b (S2); Rolls 2007b (S3); Rolls 2010a (E1); Rolls 2010b (E2); Russell 1980; Scott 2008b (S2); Scott 2008c (S3); Scott 2008d (S4); Shah 2011; Spill 2010; Spill 2011b; Stroebele 2009; van Kleef 2012; van Kleef 2013; Wansink 2001; Wansink 2005b; Wansink 2011b).

Random‐effects meta‐analysis showed a summary mean effect size (SMD) of 0.37 (95% CI 0.29 to 0.45, P value < 0.001), suggesting that exposure to larger‐sized portions, packages, individual units or tableware increased the quantities of food or tobacco people consumed and that the relative effect size was small to moderate (Figure 5). This result was consistent between random‐effects and fixed‐effect models with the fixed‐effect model generating a SMD of 0.40 (95% CI 0.35 to 0.45). The I² statistic shows that 58.4% of the total variance in study‐level estimates of this effect was due to statistical heterogeneity (variation in true effect sizes across studies) rather than sampling error (chance). This represents substantial heterogeneity. A 95% interval for prediction of an effect in a new study similar to the included studies ranges from SMD ‐0.21 to SMD 0.96, reflecting effects ranging from a moderate reduction to a large increase in consumption. An Egger test for funnel plot asymmetry did not identify evidence consistent with the presence of publication bias (P value = 0.20) (Figure 6).

Figure 5

Forest plot of the standardised mean difference in unregulated consumption of food or tobacco between participants exposed to larger (intervention) versus smaller (control) sized portions, packages, individual units and/or tableware

Figure 6

Assessing publication bias. Funnel plots including all studies reporting the selection outcome (left) and consumption outcome (right) do not show asymmetry (Egger test P value = 0.20 and P value = 0.18 respectively)

The results of a sensitivity analysis, in which standard deviations imputed for five independent comparisons (five studies: Argo 2012 (S1); Argo 2012 (S2); Argo 2012 (S4); Mishra 2012 (S1); Mishra 2012 (S2)) were (1) doubled and (2) halved (see Sensitivity analysis), indicated that the interpretation of the results of this meta‐analysis is not influenced by changes in the values of imputed standard deviations. Summary mean effect sizes (SMDs) estimated for this sensitivity analysis using random‐effects models were (1) 0.36 (95% CI 0.28 to 0.44, P value < 0.001) and (2) 0.37 (95% CI 0.29 to 0.46, P value < 0.001), respectively. Corresponding summary mean effect sizes (SMDs) from fixed‐effect models were (1) 0.37 (95% CI 0.32 to 0.42) and (2) 0.50 (95% CI 0.45 to 0.54).

Potential modifiers of the effect of larger size on consumption

We conducted a series of meta‐regression analyses to investigate the extent to which this substantial heterogeneity could be explained by study‐level covariates. Of 71 candidate study‐level covariates, 40 were excluded due to either insufficient data (< 10 included studies) or were not estimable due to the absence of variability in data values between studies. Univariable meta‐regression analysis results for the 31 remaining study‐level covariates are presented in Appendix 3. We observed six of these covariates to be associated with the effect of larger‐sized portions, packages, individual units or tableware on the quantities of food or tobacco people consume. Below, we report results from each stage of our meta‐regression analyses (as described in the Data synthesis section) and for each stage highlight any variables that we observed to be associated with the intervention effect. We also report on any variables that the review team pre‐specified as potential effect modifiers, but which were not observed in our univariable meta‐regression analyses to be associated with the intervention effect.

Type of product (food, alcohol, tobacco)

Meta‐regression analysis did not find evidence that the effect of larger‐sized portions, packages, individual units or tableware on consumption differed by the type of product studied (i.e. between food and tobacco products ‐ there were no outcome data for alcohol products). However, based on overall low quality evidence from tobacco studies comprising 108 total participants (effective sample size), exposure to longer versus shorter cigarettes was not found to influence the quantity consumed (SMD 0.25, 95% CI ‐0.14 to 0.65) in tobacco studies, while moderate quality evidence for a small to moderate effect of exposure to larger versus smaller‐sized portions, packages or tableware was found among food studies (SMD 0.38, 95% CI 0.29 to 0.46) based on data collected from 6603 total participants (effective sample size).

Study characteristics

Effect sizes were smaller in studies with a within‐subjects design than in those with a between‐subjects design. Specifically, increases in the amount of food or tobacco consumed by participants exposed to larger‐sized portions, packages, individual units or tableware were, on average, 0.40 units smaller (95% CI ‐0.55 to ‐0.25) in studies with a within‐subjects design than in those with a between‐subjects design. Effect sizes for each of these subgroups are presented in Figure 7, showing that exposure to larger sizes increased consumption among participants in both within‐subjects and between‐subjects studies.
Effect sizes were larger in studies of less healthy food products. Specifically, each 10‐point increase in Food Standards Agency (FSA) nutrient profile score corresponded to a 0.06 unit increase (95% CI 0.04 to 0.22) in the amount of additional food consumed as a result of exposure to larger sizes.
Effect sizes were larger in studies of more energy‐dense food products. Specifically, each one‐point increase in energy density score (a component of the FSA nutrient profile score) corresponded to a 0.04 unit increase (95% CI 0.00 to 0.08) in the amount of additional food consumed as a result of exposure to larger sizes.
Effect sizes were larger in studies of food products in which the manipulated food(s) comprised all of those available in the study and all were consumed ad libitum than in the other studies of food products. Specifically, increases in the amount of food consumed as a result of exposure to larger sizes were, on average, 0.22 units larger (95% CI 0.02 to 0.41) in studies of food products in which the manipulated food(s) comprised all of those in the study and all were consumed ad libitum than in studies of food products that did not have these characteristics.
Effect sizes were larger in studies of food products in which outcome data mapped directly onto the manipulated food(s), as opposed to a wider set of foods including, but not limited to, the manipulated food(s). Specifically, increases in the amount of food consumed as a result of exposure to larger sizes were, on average, 0.32 units larger (95% CI 0.16 to 0.48) in studies of food products in which outcome data mapped directly onto the manipulated food(s) than in studies of food products in which outcome data mapped to a wider set of foods including, but not limited to, the manipulated food(s).
Meta‐regression analysis did not find evidence that the size of the effect of larger size on consumption was associated with the target of the manipulation (i.e. whether this was a portion, package, individual unit or tableware). Effect sizes for each of these subgroups are presented in Figure 7. While no evidence was found for an effect of exposure to larger‐sized packages and individual units on consumption within the 'package with individual unit' subgroup, this analysis was likely underpowered. We found evidence for this effect in all other subgroups (see Figure 7).

Figure 7

Summary effect sizes (standardised mean differences) in subgroups of studies (consumption outcome)

Intervention characteristics

In meta‐regression analysis, we observed neither the absolute nor the relative difference in size between the two portions, packages, individual units or items of tableware being compared to be associated with the effect of larger size on consumption. This pre‐planned analysis explored the relationship between relative difference in size and the effect of larger size on consumption using a linear regression that (as can be inferred from the null result) showed no convincing evidence of a linear relationship. On visual examination of the relationship, however, a pattern was apparent, with a bimodal distribution of the variable that captures the relative difference in size (that is, the variable that expresses the larger size as a proportion of the smaller size within each independent pairwise comparison ‐ see also Included studies). We therefore undertook a post‐hoc analysis in order to characterise this relationship among studies of food products (that is, limited to independent pairwise comparisons between food portion, package, individual unit or tableware sizes). Specifically, we conducted a meta‐analysis to investigate the effect of larger size on consumption among two subgroups of studies (independent comparisons) clustered around each mode of the identified bimodal distribution (see also Included studies): (1) those in which the larger‐sized portion, package, individual unit of food or item of tableware was in the range between 120% and 160% of the smaller size; and (2) those in which the larger‐sized portion, package or individual unit of food was ≈200% of the smaller size. This analysis therefore excluded outliers (that is, excluding nine independent comparisons in which the larger‐sized portion, package, individual unit of food or item of tableware was > 202% of the smaller size, from Coelho do Vale 2008 (S2), Devitt 2004, Marchiori 2012a, Raynor 2007, Raynor 2009, Shah 2011, van Kleef 2013 and Wansink 2011b ‐ range 243% to 2607%). Summary effect sizes (SMDs), estimated using random‐effects models for each subgroup, were: (1) 0.25 (95% CI 0.15 to 0.35), I² = 22% (based on 39 independent comparisons, 2415 participants); and (2) 0.50 (95% CI 0.31 to 0.69), I² = 66% (based on 25 independent comparisons, 1414 participants).

Participant characteristics

Effect sizes were larger in studies comprising older participants. Specifically, each 10‐year increase in the mean age of participants corresponded, on average, to a 0.09 unit increase (95% CI 0.00 to 0.18) in the incremental amount of food or tobacco consumed as a result of exposure to larger sizes. This result is set in the context of overall moderate quality evidence that the effect of exposure to larger size on consumption of food was present among both children (SMD 0.21, 95% CI 0.10 to 0.31 ‐ moderate quality evidence ‐ 1421 participants) and adults (SMD 0.46, 95% CI 0.40 to 0.52 ‐ moderate quality evidence ‐ 5182 participants) ‐ see Figure 7 and summary of findings Table for the main comparison. We also identified variation in this effect size between studies comprising adult participants of different ages.
We did not observe the following participant characteristics to be associated with the effect of larger size on consumption: gender, BMI, hunger, dietary restraint and dietary disinhibition.

Final regression model

A meta‐regression model was estimated to identify the collection of study‐level covariates that best explained the between‐studies component of the total variance in estimates of the effect of larger sizes on consumption. The final random‐effects model explained 91% of the between‐studies variance in effect sizes for the consumption outcome (R² = 90.77%, P value = 0.001), leaving 9% unexplained. This model incorporated the following five covariates, each of which had been identified as a potential modifier of the effect of larger sizes on consumption based on observed associations in univariable meta‐regression analyses: study design (within‐subjects or between‐subjects); FSA 'nutrient profile score'; FSA 'energy density score'; participants' mean age; and a variable differentiating studies of food products in which the manipulated food(s) comprised all of those available in the study and all were consumed ad libitum from other food studies. The variable differentiating food studies, in which outcome data mapped directly onto the manipulated food(s) as opposed to a wider set of foods, was excluded from the final model for two reasons: first, its addition did not increase the adjusted R² and second, due to its collinearity with the study design covariate (within‐subjects or between‐subjects). Not all of the five incorporated covariates were independently predictive of effect size (consumption) in the final model. Figure 8 comprises three bubble plots that show associations between study‐level effect sizes (effect of larger size on consumption) and each of the three continuous variables identified as potential effect modifiers: FSA 'nutrient profile score'; FSA 'energy density score'; and participants' mean age.

Figure 8

Bubble plots. Fitted meta‐regression lines showing associations between study‐level effect sizes for consumption and study characteristics (continuous variables) identified as effect modifiers: a) FSA score; b) energy density; c) age.

1.2. Effect of shape on consumption

One food study involving 50 adult participants investigated the effect of shape on unregulated consumption (Wansink 2005d). This study investigated the effect of being provided with shorter, wider (versus taller, narrower) empty clear plastic bottles on the quantities of water selected and consumed one hour after vigorous physical activity in a sample of US Army and Marine Reserve Officer's Training Corps students. It reported an effect size (SMD) of 1.17 (95% CI 0.57 to 1.78), assessed as very low quality evidence for a large effect of shorter, wider bottles on quantities of water consumed, given that participants provided with shorter, wider bottles had more water available for consumption than those provided with taller, narrower bottles due to having selected (poured) more in the first place (see Potential modifiers of the effect of shape on selection without purchase, below).

Potential modifiers of the effect of shape on consumption

Investigation of potential modifiers of the effect of shape on consumption was not possible as only one study (comprising one comparison) investigated this effect (Wansink 2005d).

2. Selection

Seventeen comparisons identified from 14 eligible studies assessed the effect of exposure to different sizes or shapes of portions, packages or tableware on quantities of food selected for consumption by exposed participants. No studies investigated this effect in relation to alcohol or tobacco products. None of the 17 comparisons involved purchasing of the food selected for consumption (that is, all measured unregulated selection without purchase).

2.1. Effect of larger size on selection without purchase

We conducted a meta‐analysis to investigate the effect of exposure to larger size on unregulated selection without purchase. Based on characteristics of the studies it incorporated, this meta‐analysis effectively investigated the effect of exposure to larger‐sized portions or tableware on participants' unregulated selection without purchase of food. Usable outcome data were available for 13 comparisons, involving 1164 participants, identified from 10 eligible food studies that we assessed as being at unclear or high risk of bias (DiSantis 2013; Fisher 2003; Fisher 2013; Koh 2009; van Kleef 2012; Wansink 1996a (S1); Wansink 1996b (S2); Wansink 1996c (S4); Wansink 2006; Wansink 2011a (S4).

Random effects meta‐analysis showed a mean summary effect size (SMD) of 0.42 (95% CI 0.24 to 0.59, P value = 0.011), providing overall moderate quality evidence that exposure to larger‐sized portions, packages, individual units or tableware increased the quantities of food people selected for consumption and that the relative effect size was on average small to moderate (Figure 9). This result was consistent between random‐effects and fixed‐effect models, with the fixed‐effect model generating a SMD of 0.40 (95% CI 0.28 to 0.52). The I² statistic indicated that 53.5% of the total variance in study‐level estimates of this effect was due to statistical heterogeneity (substantial heterogeneity). A 95% interval for prediction of an effect in a new study similar to the included studies ranges from SMD ‐0.14 to SMD 0.97, reflecting effects ranging from a small reduction to a large increase in quantity of food selected. An Egger test for funnel plot asymmetry did not identify evidence consistent with the presence of publication bias (P value = 0.18) (Figure 6).

Figure 9

Forest plot of the standardised mean difference in unregulated selection (without purchase) of food between participants exposed to larger (intervention) versus smaller (control) sized portions, packages and/or tableware

The results of a sensitivity analysis, in which standard deviations imputed for one independent comparison (one study: Wansink 1996c (S4)) were (1) doubled and (2) halved (see Sensitivity analysis), indicated that the interpretation of the results of this meta‐analysis is robust to changes in the value of the imputed standard deviation. Summary mean effect sizes (SMDs) estimated for this sensitivity analysis using random‐effects models were (1) 0.42 (95% CI 0.23 to 0.60, P value < 0.001) and (2) 0.41 (95% CI 0.25 to 0.58, P value < 0.001) respectively. Corresponding summary mean effect sizes (SMDs) from fixed‐effect models were (1) 0.42 (95% CI 0.28 to 0.52) and (2) 0.40 (95% 0.30 to 0.50).

Potential modifiers of the effect of larger size on selection without purchase

We conducted a series of meta‐regression analyses to investigate the extent to which this substantial heterogeneity in effect sizes could be explained by study‐level covariates. These analyses were limited by low statistical power. Most of the 71 candidate study‐level covariates were excluded due to either insufficient data (< 10 included studies) or were not estimable due to the absence of variability in data values between studies. A full set of results of these univariable meta‐regression analyses is provided in Appendix 4. Of 15 study‐level covariates investigated in these analyses, we observed two to be associated with the effect of larger‐sized portions, packages and/or tableware on the quantities of food participants selected for consumption. Below, we report results from each stage of our meta‐regression analyses (as described in the Data synthesis section) and for each stage highlight any variables that we observed to be associated with the intervention effect. We also report on any variables that the review team pre‐specified as potential effect modifiers, but which were not observed in our univariable meta‐regression analyses to be associated with the intervention effect.

Type of product (food, alcohol, tobacco)

This was excluded due to absence of variation in product type between included comparisons: all comparisons related to food products.

Study characteristics

Effect sizes were smaller in studies with a within‐subjects design than in those with a between‐subjects design. Specifically, increases in the quantities of food selected as a result of exposure to larger‐sized portions or tableware were, on average, ‐0.41 units smaller (95% CI ‐0.76 to ‐0.06) among studies with a within‐subjects design than among those with a between‐subjects design. Effect sizes for each of these subgroups presented in Figure 10 further indicate that exposure to larger sizes was observed to be associated with increased selection of food among participants in between‐subjects studies but not among participants in within‐subjects studies.
Effect sizes were larger in studies of food products in which outcome data mapped directly onto the manipulated food(s), as opposed to a wider set of foods including (but not limited to) the manipulated food(s). Specifically, increases in the quantities of food selected as a result of exposure to larger sizes were, on average, 0.41 units larger (95% CI 0.06 to 0.76) in the former subgroup than in the latter.
Meta‐regression analysis did not find evidence that the size of the effect of larger size on selection of food was associated with the target of the manipulation (i.e. whether this was a portion or an item of tableware). Effect sizes for each of these subgroups are presented in Figure 10, which shows that evidence for this effect was found in both studies manipulating portion size (SMD 0.30, 95% CI 0.09 to 0.50) and those manipulating tableware size (SMD 0.51, 95% CI 0.21 to 0.81).

Figure 10

Summary effect sizes (standardised mean differences) in subgroups of studies (selection outcome)

Intervention characteristics

In meta‐regression analysis, we did not observe the relative difference in size between the two portions or items of tableware being compared to be associated with the effect of larger size on selection without purchase. The potential association between this effect and absolute difference in size could not be investigated due to insufficient data.

Participant characteristics

Potential associations between the effect of larger size on selection and the following participant characteristics could not be investigated using meta‐regression analysis due to insufficient data: age, BMI, hunger, dietary restraint and dietary disinhibition. We observed no association between this effect and participants' gender. The results of an illustrative analysis presented in Figure 10 indicate that the effect of exposure to larger size on selection of food was present among adults (SMD 0.55, 95% CI 0.35 to 0.75 ‐ moderate quality evidence ‐ 782 participants) but not among children (SMD 0.14, 95% CI ‐0.06 to 0.34 ‐ low quality evidence ‐ 382 participants) ‐ see also summary of findings Table for the main comparison.

Final regression model

Variation in study design (within‐subjects versus between‐subjects) alone explained 79% of the statistical heterogeneity observed in the effect of (larger) size on selection of food (R² = 79.46%), leaving 21% unexplained. The covariate of outcome data mapping directly onto the manipulated food(s) also explained 79% of this statistical heterogeneity (R² = 78.77%), leaving 21% unexplained. A meta‐regression model containing both of these covariates identified as potential effect modifiers could not be estimated due to perfect collinearity. As such the independent effect modifying influences of these two covariates cannot be disentangled. There are at least two plausible complementary explanations for the result that variation in study design explained a large proportion of this statistical heterogeneity. First, all those studies included in the meta‐analysis of the effect of larger size on selection that had a within‐subjects design measured this effect in children, whilst all those with a between‐subjects design measured it in adults. As highlighted above, the results presented in Figure 10 provide an indication that the effect of exposure to larger‐sized portions or items of tableware on quantities of food selected was found in studies of adults but not in studies of children. Second, all source studies included in this meta‐analysis that had a within‐subjects design were conducted by teams from one research centre, as (largely) were source studies that had a between‐subjects design.

2.2. Effect of shape on selection without purchase

We conducted a meta‐analysis to investigate the effect of shape on unregulated selection. Given the characteristics of studies included in this meta‐analysis, it effectively investigated the effect of being provided with shorter, wider empty glasses or plastic bottles on participants' unregulated selection (without purchase) of fruit juices or water in a single, self serve setting. Usable outcome data for this meta‐analysis were available for three comparisons, involving 232 participants, identified from three eligible food studies assessed as being at unclear or high risk of bias (Wansink 2003 (S1); Wansink 2003 (S2); Wansink 2005d).

Random‐effects meta‐analysis showed a mean summary effect size (SMD) of 1.47 with wide confidence intervals (95% CI 0.52 to 2.43). This result provides overall low quality evidence that exposure to shorter, wider glasses or plastic bottles increased the quantities of fruit juices or water people selected for consumption and that the relative size of this effect was very large (Figure 11). This result was consistent between random‐effects and fixed‐effect models with the fixed‐effect model generating a SMD of 1.39 (95% CI 1.10 to 1.69). Although 95% confidence intervals were wide, the lower bound of 0.52 based on the random‐effects model still represents a moderate effect size. The I² statistic from the random‐effects model shows that 90.1% of the total variance in study‐level estimates of this effect was due to statistical heterogeneity (considerable heterogeneity).

Figure 11

Forest plot of the standardised mean difference in unregulated selection without purchase of fruit juices or water between participants exposed to shorter, wider (intervention) versus taller, narrower (control) empty glasses or plastic bottles

Potential modifiers of the effect of shape on selection without purchase

We conducted no meta‐regression analyses to investigate the extent to which this statistical heterogeneity could be explained by study‐level covariates, due to insufficient data. However, it is likely that the considerable between‐studies variance in estimates of this effect may be attributable to the influence of variations between the three source studies providing data incorporated into this meta‐analysis in terms of their participants, interventions, comparisons and settings. Although Wansink 2003 (S1) and Wansink 2003 (S2) both investigated the effect of being provided with shorter, wider (versus taller, narrower) empty glasses on quantities of fruit juices selected by participants from a cafeteria line for consumption at breakfast, the former investigated this effect in a sample of adolescents (aged 12 to 17 years) attending a six‐week health and fitness camp who were motivated as a group to lose weight as well as trained to monitor how much they consumed, whilst the latter investigated the effect in a convenience sample of adults attending a weekend camp on jazz improvisation. The third source study, Wansink 2005d, investigated the effect of being provided with shorter, wider (versus taller, narrower) empty clear plastic bottles on the quantities of water selected for consumption one hour after vigorous physical activity in a sample of US Army and Marine Reserve Officer's Training Corps students. The study conducted in children, Wansink 2003 (S1), comprised 96 participants and found a SMD of 2.31 (95% CI 1.79 to 2.83 ‐ low quality evidence), whilst the estimated summary effect size in the subgroup of two studies conducted in adults, Wansink 2003 (S2) and Wansink 2005d, comprising 136 participants, was SMD 1.03 (95% CI 0.41 to 1.65 ‐ low quality evidence).

Discussion

Summary of main results

Main effects of size and shape on consumption and selection

Size

A clear finding of this review is that people exposed to larger‐sized portions, packages, individual units or tableware consistently consumed larger quantities of food compared with those exposed to smaller sizes. We rated the overall quality of evidence for a small to moderate effect of portion, package, individual unit or tableware size on food consumption among both children and adults as moderate. This quality rating confers confidence that the true effect is likely to be close to the estimated effect size (that is, small to moderate), but leaves open the possibility that it may be substantially different.

If sustained across the whole diet, the summary effect size attributable to these differences in product size would be equivalent to an absolute change in average daily energy intake from food (that is, energy intake from food and non‐alcoholic beverages, but excluding energy intake from alcoholic beverages and dietary supplements) of 215 to 279 kcal among UK adults (a 12% to 16% change from a baseline of 1727 kcal per day ‐ see Figure 4) (National Centre for Social Research 2012). Sustained reductions in daily energy intake from food of this size would have the potential to make meaningful contributions to the prevention and treatment of major risk factors for non‐communicable diseases. For example, 10‐year weight gain between 1999 and 2009 among adults in England (that is, 9 kg at the 90th percentile) has been estimated to be equivalent to extra energy intake of around 24 kcal per day over the same period (Department of Health 2011). Any sustained reductions in daily energy intake exceeding this level are therefore likely to be effective in helping to prevent further weight gain in the population (Department of Health 2011). In relation to the treatment of obesity, the UK National Institute for Health and Care Excellence recommends that adults should lose no more than 0.5 to 1 kg (1 to 2 lb) a week (NICE 2014). This rate of weight loss equates to an energy deficit of 500 to 1000 kcal per day. Although this target energy deficit is some way beyond the effect sizes that could feasibly be achieved by interventions to reduce portion size alone (based on our summary estimate of this effect among studies included in the review), our result suggests that interventions of this kind could meaningfully contribute to helping patients achieve such a target if their effects were sustained. Whilst these illustrations highlight the promise of interventions to reduce exposure to larger portion sizes, it is important to highlight that the sustainability of effects remains to be established, since studies included in this review were limited to the investigation of one‐off or repeated exposures over short time periods (see also Implications for practice and Implications for research). Moreover, very few studies included in this review investigated effects among samples of participants motivated to lose weight, further limiting inferences that can be drawn with respect to obesity treatment.

We also found overall moderate quality evidence for a small to moderate effect of portion or tableware size on food selection among adults. Adults consistently selected larger quantities of food for consumption when exposed to larger sizes (compared with exposure to smaller sizes). This result is consistent with the role of food selection as an important intermediate endpoint in pathways to consumption. If we assumed that all food selected for consumption were consumed and that this effect size were sustained over time (noting again that we found no evidence for sustainability of effects), it would be equivalent to an absolute change in average daily energy intake from food of 188 to 403 kcal among UK adults (an 11% to 23% change from a baseline of 1727 kcal per day ‐ see Figure 4) (National Centre for Social Research 2012). Whilst we did not find an effect of portion or tableware size on food selection among children, this result was based on overall low quality evidence from a small number of studies (independent comparisons), which confers limited confidence in our estimate of this effect (that is, the true effect among children may be substantially different from our estimate).

We did not find evidence for an effect of individual unit size on consumption of tobacco, based on a meta‐analysis of data collected from studies that investigated exposure to longer versus shorter cigarettes among adult smokers. However, this finding was again based on overall low quality evidence from a small number of older studies. We did not identify any eligible studies that investigated the effects of exposure to differently sized cigarette packs (for example, packs of 20 cigarettes versus packs of 10 cigarettes). Nor did we identify any eligible studies that investigated the effects of exposure to differently sized alcoholic beverage products (or tableware, such as glasses, used to consume such products).

Shape

This review found overall very low quality evidence from a single included study for a large effect of exposure to shorter, wider (versus taller, narrower) plastic bottles on the quantities of water participants consumed in a single‐serve context (Wansink 2005d). In this study, participants provided with shorter, wider bottles had more water available for consumption in the first place (due to having already selected more by pouring more into their bottles from a 10 gallon container) than participants provided with taller, narrower bottles. The 'very low quality' rating means that we have little confidence in the estimate of this effect (that is, the true effect is likely to be substantially different from our estimate).

We also found overall low quality evidence for a large to very large effect of exposure to shorter, wider (versus taller, narrower) glasses or plastic bottles on the quantities of fruit juice or water participants selected for consumption in a single‐serve context. If the effect size we estimated were transferable to energy‐containing non‐alcoholic beverages (Figure 4), it would be equivalent to an absolute change of 292 to 462 grams in the average quantity of these beverages selected in a single‐serve context among UK children (a 128% to 203% change from a baseline of 228 grams per serve) or 68 to 274 grams among UK adults (a 28% to 112% change from a baseline of 245 grams per serve) respectively (National Centre for Social Research 2012). We rated the quality of evidence as low with respect to our estimates of this effect, which again confers limited confidence in their accuracy. The findings are, however, consistent with long‐established psychological theory and evidence concerning the perceptual biases associated with exposure to differently shaped receptacles (Piaget 1969). While it seems unlikely that interventions that successfully reduced exposure to shorter, wider drinking receptacles (or conversely, increased exposure to taller, narrower versions) could in practice achieve sustained reductions in self served quantities of energy‐containing non‐alcoholic beverages (or increases in self served quantities of healthier alternatives) of this magnitude, this awaits study.

Moderators of main effects

As reflected in the discussion of main effects, our results indicated that the effects of portion, package, individual unit or tableware size may be modified by the age of those exposed to such manipulations. Whilst there was evidence that children and young people exposed to larger sizes still consumed more food, the size of this effect was found to be larger among adults, also increasing (albeit by very small incremental amounts) with the age of those exposed. These results suggest that intervening to reduce exposure to larger sizes of portions, packages, individual units or tableware may be more effective in influencing food consumption among adults than among children. This finding appears consistent with suggestions in the literature that as people age, external cues to consumption play an increasingly important role in the regulation of energy intake relative to internal cues, such as hunger and satiety (Ello‐Martin 2005). This phenomenon has been observed in children, but we are not aware of any current evidence for whether this process continues over the adult life course.

It is noteworthy that, with the exception of age, no evidence was found in this review to support claims that the effects of exposure to different portion, package, individual unit or tableware sizes vary between men and women, between individuals with a different body mass index, or between those with different baseline levels of dietary restraint, dietary disinhibition or hunger (that is, those participant characteristics identified in advance as most likely to modify effects). With respect to gender and body mass index, we note that these findings differ from those suggested by the results of another recent review of food portion size effects (Zlatevska 2014). In relation to gender and amounts consumed, Zlatevska and colleagues found that female participants responded less to a doubling of portion size than did male participants (Zlatevska 2014). In relation to body mass index and amounts consumed, they found that overweight participants responded less to a doubling of portion size than did non‐overweight participants (Zlatevska 2014) ‐ a result which the authors highlight was unexpected since it challenges previous research suggesting that overweight people may be less sensitive to satiation and more sensitive to external cues than those who are not overweight (Wansink 2007b).

We were unable to examine effect moderation by study participants' socioeconomic status in this review due to the infrequency of reporting of such measures across included studies (this was one component of analysis intended to inform assessment of social differentiation in effects relevant to health equity ‐ seeObjectives and further, related discussion in Overall completeness and applicability of evidence). Socioeconomic status therefore remains an important potential moderator of the effects of sizing interventions that deserves closer attention in future research (see Implications for research).

We did, however, find evidence that this effect of size on consumption may be moderated by the type of food, specifically characterised by the healthiness and energy density of the manipulated food(s), with larger effects found in studies that manipulated less healthy products and in those that manipulated more energy‐dense products (albeit by very small incremental amounts) (see Implications for practice for further discussion of these tentative findings).

We found little evidence consistent with the proposal that the observed effects of size on consumption or selection may differ depending on whether it is the size of a portion, package, individual unit or item of tableware size that is altered. This finding indicates that interventions that successfully reduce exposure to larger sizes can be effective across a range of targets for manipulation.

However, we did identify some evidence to indicate that between‐study variation in the effect of larger size on food consumption may be attributable in part to between‐study differences in the relative size of the two portions, packages, individual units or items of tableware being compared. Although this finding is based on the results of a post‐hoc subgroup analysis (see Effects of interventions), we note that the results are consistent with our prior assumptions that the dose‐response relationship between portion size and consumption or selection would be linear at many of the sizes investigated (see Data synthesis), but that at extremes a non‐linear relationshipcould be expected due to a ceiling effect: external cues, such as social norms or perceptual biases that indicate a given amount of a product is appropriate, will eventually give way to internal cues to stop consuming, such as satiety. A recent analysis that plotted the absolute portion size served to each group of participants among included studies against the average (mean) amount of food they consumed from that portion also found a relationship of this kind (Zlatevska 2014). We reiterate (as stated in Included studies) that absolute sizes investigated in included food studies tended to be large compared with reference portion sizes, derived from a published report on typical portion sizes in the UK in 2002 (Food Standards Agency 2002). Knowledge of how the sizes of portions, packages and tableware investigated among included studies compare with reference portion sizes for those foods in different settings was not fully elucidated by this review due to the limited scope and availability of data (from included studies and external sources) to fully address it. However, this remains a critical issue for determining the policy implications of our findings concerning the effects of larger size on selection and consumption (see further commentary on this issue in Overall completeness and applicability of evidence and Implications for practice).

Meta‐regression analyses identified two further variables as potential moderators of the main effects of size on both consumption and selection, both methodological variables. The first variable delimits studies with a within‐subjects design and those with a between‐subjects design (effect sizes were larger in between‐subjects studies). We cannot fully explain this result. It may be an artefact of the different methods used to measure effects in between‐subjects and within‐subjects designs respectively: there are two independent groups in the former but only one group (with repeated measures for each participant) in the latter. Alternatively, the result may be due to factors related to the choice of design, including other methods and procedures applied by research centres using different study designs. The second variable distinguishes studies of food products in which the manipulated food(s) comprised all of those available in the study from all other studies (effect sizes were larger in the former studies). Providing additional foods for study participants to consume beyond those that were manipulated may result in additional energy consumption in either or both comparison groups, with the potential to modify the effect of larger sizes due to the same ceiling effect described above.

It is important to avoid over‐interpretation of the results of the meta‐regression analyses we conducted due to their observational nature, limited statistical power and multiple tests, which meant heightened probability of type I (obtaining a false positive result) and type II (obtaining a false negative result) errors. These results should therefore be viewed primarily as generating hypotheses about potential effect modifiers that will need to be investigated in further studies, with patterns of results replicated, before more confident inferences can be drawn.

Overall completeness and applicability of evidence

The evidence synthesised in this review was collected from 72 included studies that featured 107 eligible independent comparisons between two different sizes or shapes of portions, packages, individual units or tableware used to consume food products (69 of 72 included studies), or between two different sizes (lengths) of individual units of tobacco products (cigarettes) (3 of 72 included studies). The effective sample sizes feeding into meta‐analyses of outcome data collected from included food studies typically exceeded numbers generated by a conventional sample size calculation for a single adequately powered trial (that is, the optimal information size), which strengthens confidence that these studies were sufficient to enable us to address our first objective to assess the effects of eligible interventions on unregulated selection or consumption of food products in adults and children. Moreover, included food studies encompassed a range of participants in terms of their age, gender and other trait or state characteristics, a range of specific manipulations (for example, various types of foods), and a variety of eating or drinking contexts (encompassing both laboratory and naturalistic field settings). This confers a degree of confidence that our findings concerning food are likely to be widely applicable. It was also possible to exploit variations between included studies to investigate and attempt to explain observed variations in effects, addressing the second objective of this review to assess potential effect modifiers. This allowed us to report observed associations that, if confirmed by further research, may prove useful in configuring and targeting sizing interventions for maximum effectiveness (see Implications for practice).

Eligible studies typically investigated exposures that were one‐off or, if repeated, were repeated over relatively short time periods, and participants' selection and consumption responses were typically measured over correspondingly immediate or short time periods. In addition, the laboratory and naturalistic field settings in which participants were exposed and had their selection and consumption responses measured were often highly controlled by the researchers. These findings highlight the current lack of evidence to establish whether meaningful changes in the quantities of food people consume can be sustained over the longer term in response to prolonged or repeated exposures, under free‐living conditions.

In terms of intervention characteristics, the distribution of evidence for effects on selection and consumption of food was skewed towards pairwise comparisons in which the difference in relative size of the portions, packages, individual units or tableware was large. In addition, the absolute sizes investigated in food studies tended to be large. Therefore, while included food studies did cover a range of absolute and relative sizes, further studies focusing on smaller incremental changes at the smaller end of the portion size continuum are needed to strengthen the evidence base in this respect.

As highlighted above (see Summary of main results), knowledge of how the absolute sizes of food portions and packages investigated among studies included in this review compare with reference portion sizes for those specific foods (defined here as the size that is recommended to be consumed, or that is customarily consumed, in a single eating occasion, by one or more schemes for communicating portion size messages to consumers (Lewis 2012)) is critical to the interpretation of the results of this review. However, this relationship is both complex and dynamic. Alongside variation between specific food products within each scheme, there is also variation between reference portion sizes for comparable products between schemes and jurisdictions (for example, recommended amounts may be defined by food manufacturers, food retailers, government agencies or non‐governmental organisations, and may provide general advice or weight‐loss advice (Institute of Grocery Distribution 2008; Lewis 2012)). Schemes that provide reference portion size information based on amounts customarily consumed are also typically based on analysis of dietary intake within a defined population, which will also vary between population subgroups and over time; estimates from some schemes still in current use may therefore diverge from current dietary intakes due to their age (for example, the US Food and Drug Administration's Reference Amounts Customarily Consumed are largely based on data published in 1993 (USFDA 2014)). It is therefore important to highlight that our discussion of potential policy actions that would be consistent with the evidence in this review concerning the effects of size on consumption of food (see Implications for practice, below) is necessarily tempered by consideration of where this body of evidence may be located on the 'absolute size continuum'. Our observation that the absolute sizes investigated in food studies tended to be large is based primarily on comparison with external data, derived from ranges of typical dietary intakes (amounts customarily consumed in a single eating occasion), that were published in 2002 (Food Standards Agency 2002), which may not be transferable to the present day or other settings. The key message is that we urge caution in extrapolating the results of this review beyond the range of relative size differences between, and/or the absolute sizes of, portions, packages and tableware sizes investigated among included studies.

Specifically, the limited body of evidence identified for the consumption effects of exposure to different portion, package and tableware sizes at the smaller end of the size continuum means that we cannot be certain whether reducing portions at the smaller end of the size range can be as effective in reducing food consumption as reductions at the larger end of the range. There may also be some potential for unintended effects of exposure to small portions. Exposure to smaller portions than those typically encountered could sometimes lead to increased consumption. One possibility is that people may avoid selecting or consuming larger portions of products they perceive as unhealthy, but allow themselves to indulge when those products are presented in small sizes, thereby shifting from no consumption to some. The potential for unintended compensatory effects (that is, compensating for smaller portions by eating more later in the day), whilst not evident from individual studies we have encountered (Jeffery 2007; Kral 2004a; Lewis 2015; Vermeer 2011), is another related issue that deserves close attention.

We judged few participant samples in included food studies to be characterised by high levels of material or social deprivation; few studies measured participants' socioeconomic status and no studies reported effects disaggregated by socioeconomic subgroup. Moreover, evidence for effects on selection and consumption of food was derived mainly from studies conducted in US samples, with no included studies conducted in low or middle‐income countries (LMICs). These factors largely precluded any assessment of social differentiation in effects relevant to health equity (with the exception of gender ‐ see Effects of interventions, 'Potential modifiers of the effect of larger size on consumption') (see alsoObjectives). We have no reasons to expect that cognitive biases proposed as mechanisms by which exposure to these interventions may influence food selection and consumption (for example, 'unit bias') will differ substantively between people living in high‐income countries (HICs) and those living in LMICs (see How the intervention might work). However, people living in HICs are likely to have different personal and social (descriptive and injunctive) norms about what constitutes a suitable amount of food to consume than those living in LMICs and such factors have been proposed to influence the effects of exposure to larger sizes on food selection and consumption. A range of other social, cultural, economic and contextual differences surrounding diet‐related behaviours between people living in HICs and LMICs may also plausibly modify these effects. For these reasons, the predominance of US evidence may limit the applicability of findings of this review to LMICs (and also to other HICs) to some extent.

This review identified three studies that investigated the effects of exposure to longer versus shorter cigarettes on tobacco consumption (Jarvik 1978 (E1); Jarvik 1978 (E2); Russell 1980). We did not identify any tobacco studies investigating the effects of exposure to different sizes (or shapes) of cigarette packs, which may be an alternative target for interventions to reduce exposure to single cigarettes or packs containing smaller than standard numbers of cigarettes. Applicability of the evidence derived from the three included tobacco studies we did find, published in 1978 and 1980, may be limited by its age. The small effective sample size (six independent comparisons, 108 participants) contributing to our meta‐analysis from these studies further weakens confidence that they provided sufficient evidence to allow us to address the first objective of this review with respect to tobacco products. The true effect of exposure to longer versus shorter cigarettes on tobacco consumption is likely to be substantially different from our summary estimate. Results based on evidence from tobacco studies should therefore be interpreted with caution.

The most notable gap in this evidence base, however, was the absence of any randomised controlled trials investigating effects on unregulated selection or consumption of alcoholic beverage products. This finding is in keeping with the small proportion of studies on alcohol, compared with food products, which we found in a large scoping review of interventions that involve altering the properties or placement of objects or stimuli within small‐scale micro‐environments to change health behaviour, of which 'sizing interventions' was just one type (Hollands 2013a; Hollands 2013b). One possible reason for the current dearth of studies on alcohol is that this reflects the focus of recent alcohol policies on reducing consumption in harmful and hazardous drinkers through individual‐level interventions (Kaner 2009). Interventions that target price can reduce consumption of alcohol across populations (Holmes 2014; Wagenaar 2009), but such interventions are generally unacceptable to industry, politicians and the general public (Diepeveen 2013). More recent evidence regarding the harmful effects on population health of alcohol consumption at moderate levels (Rehm 2015) may extend the research focus to include interventions in micro‐environments such as those pertaining to size.

Quality of the evidence

Ratings of the overall quality of evidence incorporated into this review ranged between moderate and very low, which leaves open the possibility that our estimates of intervention effects differ substantially from true effects. Confidence in estimates of effects was diminished by serious concerns about study limitations, which were primarily raised by unclear and incomplete reporting of study methods and procedures by authors of included studies. Indeed, we identified limitations in study reporting and/or conduct with respect to each of the domains judged most critical to 'Risk of bias' assessment in this review: random sequence generation (selection bias); allocation concealment (selection bias); blinding of participants and personnel (performance bias); and baseline comparability of participant characteristics between groups (other bias). Given the nature of the included studies, we could not identify any obvious reason to prevent the straightforward implementation of unbiased methods and procedures for random sequence generation and allocation concealment. The use of within‐subjects designs precluded the blinding of participants in over half of the included studies, but we did not judge lack of blinding to place studies at high risk of bias in this domain due to a general lack of evidence for the presence and potential influence of carry‐over effects among included studies. We did not consider blinding of personnel (that is, intervention providers) to be a relevant consideration in assessing risk of bias in included studies because personnel were not judged instrumental in delivery of the intervention. Finally, while it may not always be practical to test such differences in applied field settings, in many instances baseline comparability of participant characteristics between comparison groups can and should be examined.

We identified few concerns regarding inconsistency in study results, since in general large amounts of unexplained inconsistency did not remain following planned investigations of potential effect modifiers using meta‐regression analyses. There were no serious concerns about the directness of the assembled evidence either, since it was all derived from studies that directly compared the interventions in which we were interested, in groups of eligible participants, and incorporated direct (and typically objective) measures of unregulated selection or consumption.

We had no serious concerns about imprecision in relation to our estimates of the effects of exposure to larger (versus smaller) portion, package, individual unit or tableware size on unregulated selection or consumption of food, since (as noted above) effective sample sizes comfortably exceeded the numbers generated by conventional sample size calculations for single adequately powered trials (optimal information sizes). However, we did have serious concerns about imprecision in relation to our estimates of the effect of exposure to longer (versus shorter) cigarettes on consumption of tobacco, and of the effect of exposure to shorter, wider (versus taller, narrower) glasses or plastic bottles on consumption of non‐alcoholic beverages, based on consideration of both threshold optimal information sizes and confidence intervals.

Potential biases in the review process

Whilst it is possible that we may have failed to identify every study eligible for inclusion in this review, we took several steps to minimise this risk, including our use of highly sensitive search strategies and backward and forward citation searches. We therefore consider it improbable that we have failed to identify sufficient relevant evidence to substantively alter our conclusions. The scope, scale and complexity of this review and its analysis meant that we took the pragmatic decision (in consultation with the Cochrane Public Health Review Group) to defer full integration of 11 further eligible studies identified by the updated search (30 January 2015) (Bajaj 2014; Haire 2014; Kral 2014; Marchiori 2014; Rolls 2014a; Smith 2013a; van Ittersum 2013; van Kleef 2014; Wansink 2013; Wansink 2014; Williams 2014), until the first major update of this review. However, the results of preliminary analyses of outcome data that could provisionally be extracted from each of these 11 further eligible studies (see Appendix 2) establish that there is minimal potential for the full integration of these studies to change the interpretation of the results of this review, and hence its conclusions, as currently reported in the Results, Discussion and Authors' conclusions.

Agreements and disagreements with other studies or reviews

In a review of the effects of portion sizing published in 2014, Zlatevska and colleagues found that increasing portion size led to a small to moderate increase in consumption, reporting an effect size of d = 0.45 (Zlatevska 2014). This point estimate was similar to those we found in the current review and within its 95% confidence intervals. Results of moderator analyses conducted in Zlatevska and colleagues' review were again broadly consistent with our results. First, Zlatevska and colleagues similarly reported that the intervention effect was greater in adults than in children. Second, consistent with our findings regarding moderation by healthiness and by energy density of food, they reported a larger effect for snack foods (which are typically less healthy and more energy‐dense) than non‐snack foods. Contrary to the results of our analysis, however, they reported finding a larger effect among men than among women and a smaller effect among overweight participants than among participants who were not overweight. Discrepancies between the results of these analyses are expected since they used different data sets as a consequence of differences in their respective eligibility criteria, procedures and analytic methods. Although criteria for considering studies in Zlatevska and colleagues' review were broadly similar to those applied in this review, the former focused exclusively on food, did not appear to exclude studies in which participants' consumption was regulated by either explicit instructions or some other action of the researcher, and additionally included studies that measured intended but not actual consumption. Zlatevska and colleagues' review did not include coverage of evidence for the effects of package, individual unit or tableware size on consumption and did not investigate food selection as an outcome. Indeed, we are not aware of any relevant, previously published reviews that investigate either the effects of exposure to food packages or to individual food units of varying size (and only one that investigates dishware size ‐ see below in this section), nor that investigate food selection as an outcome.

We are aware of only one other systematic review, published in 2013 (Small 2013), which ‐ like ours and Zlatevska and colleagues' reviews (Zlatevska 2014) ‐ encompassed evidence for the effects of exposure to food portions of varying size on energy intake among well and normally developing children. Small and colleagues aggregated evidence from six eligible primary studies ‐ all randomised controlled trials that are fully incorporated into our review (Fisher 2003; Fisher 2007a; Fisher 2007b; Fisher 2007c; Rolls 2000; Spill 2010) ‐ using a narrative synthesis and reported a similar finding: that larger served portions resulted in greater daily energy intake among participants (Small 2013).

In a review of the effect of dishware size on consumption of food published in 2014, Robinson and colleagues reported results consistent with no effect of dishware size on consumption (standardised mean difference (SMD) ‐0.18, 95% confidence interval (Cl) ‐0.35 to 0.00, P value = 0.05) ‐ although we note that the authors reported "a small effect that was not statistically significant", with exposure to larger dishware leading to greater consumption (Robinson 2014). Although this review again differed from ours with respect to its inclusion criteria (for example, non‐randomised studies were eligible and targets of the manipulation were restricted to bowl size or plate only), its estimate of this effect overlaps considerably with our corresponding estimate for the effect of tableware size on consumption (see Figure 7).

Figure 1

Figure 2

PRISMA study flow diagram.

Figure 3

Figure 4

Effect sizes re‐expressed using familiar metrics

Figure 5

Figure 6

Figure 7

Summary effect sizes (standardised mean differences) in subgroups of studies (consumption outcome)

Figure 8

Figure 9

Figure 10

Summary effect sizes (standardised mean differences) in subgroups of studies (selection outcome)

Figure 11

Summary of findings for the main comparison. Food: Larger versus smaller‐sized portions, packages or tableware for changing quantity consumed or selected

Food: Larger versus smaller‐sized portions, packages or tableware for changing quantity consumed or selected
Population: children and adults Settings: high‐income countries, laboratory and field settings Intervention: larger‐sized portion, package, individual unit or item of tableware Comparison: smaller‐sized portion, package, individual unit or item of tableware
Outcomes	*Illustrative comparative risks (95% CI)**		Relative effect (95% CI)	No of participants (studies)	Quality of the evidence (GRADE)	Comments
	Assumed risk	Corresponding risk
	Smaller‐sized portion, package, individual unit or item of tableware	Larger‐sized portion, package, individual unit or item of tableware
Consumption	Mean daily energy intake from food among a representative sample of UK children and adults is 1689 kcal³	Mean daily energy intake from food would be 189 kcal (11.2%) higher with the intervention (144 to 228 kcal higher) among UK children and adults	Mean consumption in the intervention group was 0.38 standard deviations higher (0.29 higher to 0.46 higher)	6603 (86 independent comparisons)	⊕⊕⊕⊝ MODERATE ¹
‐ Consumption among children	Mean daily energy intake from food among a representative sample of UK children is 1651 kcal³	Mean daily energy intake from food would be 95 kcal (5.7%) higher with the intervention (45 to 140 kcal higher) among UK children	Mean consumption in the intervention group was 0.21 standard deviations higher (0.1 higher to 0.31 higher)	1421 (22 independent comparisons)	⊕⊕⊕⊝ MODERATE ¹
‐ Consumption among adults	Mean daily energy intake from food among a representative sample of UK adults is 1727 kcal³	Mean daily energy intake from food would be 247 kcal (14.3%) higher with the intervention (215 to 279 kcal higher) among UK adults	Mean consumption in the intervention group was 0.46 standard deviations higher (0.40 higher to 0.52 higher)	5182 (64 independent comparisons)	⊕⊕⊕⊝ MODERATE ¹
Selection without purchase	Mean daily energy intake from food among a representative sample of UK children and adults is 1689 kcal³	Mean daily energy intake from food would be 209 kcal (12.4%) higher with the intervention (119 to 293 kcal higher) among UK children and adults⁴	Mean selection without purchase in the intervention group was 0.42 standard deviations higher (0.24 higher to 0.59 higher)	1164 (13 independent comparisons)	⊕⊕⊕⊝ MODERATE ¹
‐ Selection without purchase among children	Mean daily energy intake from food among a representative sample of UK children is 1651 kcal³	Mean daily energy intake from food would be 63 kcal (3.8%) higher with the intervention (27 to 153 kcal higher) among UK children⁴	Mean selection without purchase in the intervention group was 0.14 standard deviations higher (0.06 lower to 0.34 higher)	382 (4 independent comparisons)	⊕⊕⊝⊝ LOW ^1,2
‐ Selection without purchase among adults	Mean daily energy intake from food among a representative sample of UK adults is 1727 kcal³	Mean daily energy intake from food would be 188 kcal (10.9%) higher with the intervention (188 to 403 kcal higher) among UK adults⁴	Mean selection without purchase in the intervention group was 0.55 standard deviations higher (0.35 higher to 0.75 higher)	782 (9 independent comparisons)	⊕⊕⊕⊝ MODERATE ¹
The basis for the assumed risk* (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in representative UK samples³ and the relative effect of the intervention (and its 95% CI). CI: confidence interval
GRADE Working Group grades of evidence High quality: Further research is very unlikely to change our confidence in the estimate of effect. Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low quality: We are very uncertain about the estimate.
¹Rated down by one level for study limitations: we assessed risk of bias as unclear or high in all incorporated studies. ²Rated down by one level for imprecision: number of participants (effective sample size) incorporated into analysis is less than the number of patients generated by a conventional sample size calculation for a single adequately powered trial (optimal information size) and the confidence interval crosses zero. ³Estimates of means and standard deviations based on an unweighted analysis of data from the UK National Diet and Nutrition Survey, Years 1‐4 (National Centre for Social Research 2012) ‐ see Data synthesis. ⁴Illustration of equivalent absolute effect on daily energy intake from food assumes that all foods selected are consumed.

Summary of findings for the main comparison. Food: Larger versus smaller‐sized portions, packages or tableware for changing quantity consumed or selected

Summary of findings 2. Alcohol: Larger versus smaller‐sized portions, packages or tableware for changing quantity consumed or selected

Alcohol: Larger versus smaller‐sized portions, packages or tableware for changing quantity consumed or selected
Population: children and adults Settings: high‐income countries, laboratory and field settings Intervention: larger‐sized portion, package, individual unit or item of tableware Comparison: smaller‐sized portion, package, individual unit or item of tableware
Outcomes	*Illustrative comparative risks (95% CI)**		Relative effect (95% CI)	No of participants (studies)	Quality of the evidence (GRADE)	Comments
	Assumed risk	Corresponding risk
	Smaller‐sized portion, package, individual unit or item of tableware	Larger‐sized portion, package, individual unit or item of tableware
Consumption	No evidence is available	‐	‐	(0 independent comparisons)	‐	‐
‐ Consumption among children	No evidence is available	‐	‐	(0 independent comparisons)	‐	‐
‐ Consumption among adults	No evidence is available	‐	‐	(0 independent comparisons)	‐	‐
Selection with or without purchase	No evidence is available	‐	‐	(0 independent comparisons)	‐	‐
‐ Selection with or without purchase among children	No evidence is available	‐	‐	(0 independent comparisons)	‐	‐
‐ Selection with or without purchase among adults	No evidence is available	‐	‐	(0 independent comparisons)	‐	‐
The basis for the assumed risk* (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). CI: confidence interval
GRADE Working Group grades of evidence High quality: Further research is very unlikely to change our confidence in the estimate of effect. Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low quality: We are very uncertain about the estimate.

Summary of findings 2. Alcohol: Larger versus smaller‐sized portions, packages or tableware for changing quantity consumed or selected

Summary of findings 3. Tobacco: Longer versus shorter cigarettes for changing quantity consumed or selected

Tobacco: Longer versus shorter cigarettes for changing quantity consumed or selected
Population: children and adults Settings: high‐income countries, laboratory settings Intervention: longer cigarettes Comparison: shorter cigarettes
Outcomes	*Illustrative comparative risks (95% CI)**		Relative effect (95% CI)	No of participants (studies)	Quality of the evidence (GRADE)	Comments
	Assumed risk	Corresponding risk
	Shorter cigarettes	Longer cigarettes
Consumption	Mean number of cigarettes smoked per day among a representative sample of UK adults is 13	Mean number of cigarettes smoked per day would be 2 higher with the intervention (1 to 5 higher) among UK adults	Mean consumption in the intervention group was 0.25 standard deviations higher (0.14 lower to 0.65 higher)	108 (6 independent comparisons)	⊕⊕⊝⊝ LOW ^1,2	‐
‐ Consumption among children	No evidence is available	‐	‐	(0 independent comparisons)	‐	‐
‐ Consumption among adults	Mean number of cigarettes smoked per day among a representative sample of UK adults is 13	Mean number of cigarettes smoked per day would be 2 higher with the intervention (1 to 5 higher) among UK adults	Mean consumption in the intervention group was 0.25 standard deviations higher (0.14 lower to 0.65 higher)	108 (6 independent comparisons)	⊕⊕⊝⊝ LOW ^1,2	‐
Selection with or without purchase	No evidence is available	‐	‐	(0 independent comparisons)	‐	‐
‐ Selection with or without purchase among children	No evidence is available	‐	‐	(0 independent comparisons)	‐	‐
‐ Selection with or without purchase among adults	No evidence is available	‐	‐	(0 independent comparisons)	‐	‐
The basis for the assumed risk* (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). CI: confidence interval
GRADE Working Group grades of evidence High quality: Further research is very unlikely to change our confidence in the estimate of effect. Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low quality: We are very uncertain about the estimate.
¹Rated down by one level for study limitations: we assessed risk of bias as unclear or high in all incorporated studies. ²Rated down by one level for imprecision: number of participants (effective sample size) incorporated into analysis is less than the number of patients generated by a conventional sample size calculation for a single adequately powered trial (optimal information size) and confidence interval crosses zero. ³Estimates of means and standard deviations based on an unweighted analysis of data from the UK Opinions and Lifestyle Survey, 2012 (Office for National Statistics 2012) ‐ see Data synthesis.

Summary of findings 3. Tobacco: Longer versus shorter cigarettes for changing quantity consumed or selected

Summary of findings 4. Food: Shorter, wider versus taller, narrower glasses or plastic bottles (shape) for changing quantity of non‐alcoholic beverages consumed or selected

Shorter, wider versus taller, narrower glasses or plastic bottles (shape) for changing quantity of non‐alcoholic beverages consumed or selected
Patient or population: children and adults Settings: high‐income countries, field settings Intervention: shorter, wider glasses or plastic bottles Comparison: taller, narrower glasses or plastic bottles
Outcomes	*Illustrative comparative risks (95% CI)**		Relative effect (95% CI)	No of participants (studies)	Quality of the evidence (GRADE)	Comments
	Assumed risk	Corresponding risk
	Shorter, wider glasses or plastic bottles	Taller, narrower glasses or plastic bottles
Consumption	Mean quantity of energy‐containing non‐alcoholic beverages consumed in a single serve among a representative sample of UK adults is 245 grams⁸	Mean quantity of energy‐containing non‐alcoholic beverages consumed in a single serve would be 195 grams (79.6%) higher with the intervention (95 to 296 grams higher) among UK adults	Mean consumption in the intervention group was 1.17 standard deviations higher (0.57 higher to 1.78 higher)	50 (1 independent comparison)	⊕⊝⊝⊝ VERY LOW ^1,2	‐
‐ Consumption among adults	Mean quantity of energy‐containing non‐alcoholic beverages consumed in a single serve among a representative sample of UK adults is 245 grams⁸	Mean quantity of energy‐containing non‐alcoholic beverages consumed in a single serve would be 195 grams (79.6%) higher with the intervention (95 to 296 grams higher) among UK adults	Mean consumption in the intervention group was 1.17 standard deviations higher (0.57 higher to 1.78 higher)	50 (1 independent comparison)	⊕⊝⊝⊝ VERY LOW ^1,2	‐
‐ Consumption among children	No evidence is available	‐	‐	(0 independent comparisons)	‐	‐
Selection without purchase	Mean quantity of energy‐containing non‐alcoholic beverages consumed in a single serve among a representative sample of UK children and adults is 234 grams⁸	Mean quantity of energy‐containing non‐alcoholic beverages consumed in a single serve would be 242 grams (103.4%) higher with the intervention (86 to 400 grams higher) among UK children and adults⁹	Mean selection without purchase in the intervention group was 1.47 standard deviations higher (0.52 higher to 2.43 higher)	232 (3 independent comparisons)	⊕⊕⊝⊝ LOW ^3,4	‐
‐ Selection without purchase among children	Mean quantity of energy‐containing non‐alcoholic beverages consumed in a single serve among a representative sample of UK children is 228 grams⁸	Mean quantity of energy‐containing non‐alcoholic beverages consumed in a single serve would be 377 grams (165.5%) higher with the intervention (292 to 462 grams higher) among UK children⁹	Mean selection without purchase in the intervention group was 2.31 standard deviations higher (1.79 higher to 2.83 higher)	96 (1 independent comparison)	⊕⊕⊝⊝ LOW ^{5, 6}	‐
‐ Selection without purchase among adults	Mean quantity of energy‐containing non‐alcoholic beverages consumed in a single serve among a representative sample of UK adults is 245 grams⁸	Mean quantity of energy‐containing non‐alcoholic beverages consumed in a single serve would be 171 grams (70.1%) higher with the intervention (68 to 274 grams higher) among UK adults⁹	Mean selection without purchase in the intervention group was 1.03 standard deviations higher (0.41 higher to 1.65 higher)	136 (2 independent comparisons)	⊕⊕⊝⊝ LOW ^3,7	‐
The basis for the assumed risk* (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). CI: confidence interval
GRADE Working Group grades of evidence High quality: Further research is very unlikely to change our confidence in the estimate of effect. Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low quality: We are very uncertain about the estimate.
¹Rated down two levels for study limitations: study assessed at high risk of bias with respect to the consumption outcome (see Characteristics of included studies 'Risk of bias' tables). ²Rated down one level for imprecision: number of participants (effective sample size) incorporated into analysis is less than the number of patients generated by a conventional sample size calculation for a single adequately powered trial (optimal information size) based on the lower limit of the confidence interval. ³Rated down one level for study limitations: studies assessed at unclear or high risk of bias with respect to the selection outcome (see Characteristics of included studies 'Risk of bias' tables). ⁴Rated down one level for inconsistency. I² statistic from the random‐effects model shows that 90.1% of the total variance in study‐level estimates of this effect was due to statistical heterogeneity. ⁵Rated down one level for study limitations: study assessed at unclear risk of bias with respect to the selection outcome (see Characteristics of included studies 'Risk of bias' tables). ⁶Rated down one level for imprecision: single study. ⁷Rated down one level for inconsistency: point estimates are dissimilar and confidence intervals do not overlap. ⁸Estimates of means and standard deviations based on an unweighted analysis of data from the UK National Diet and Nutrition Survey, Years 1‐4 (National Centre for Social Research 2012) ‐ see Data synthesis. ⁹Illustration of equivalent absolute effect on quantity of energy‐containing non‐alcoholic beverages consumed in single serve assumes that all energy‐containing non‐alcoholic beverage selected in a single serve is consumed.

Summary of findings 4. Food: Shorter, wider versus taller, narrower glasses or plastic bottles (shape) for changing quantity of non‐alcoholic beverages consumed or selected

Table 1. Record of conceptual model development

Construct	Variable description (type)	Category	Included in provisional conceptual model?	Included in final conceptual model?	Included study first encountered	Other included studies encountered	Rationale for inclusion in final conceptual model	Supporting evidence	Rationale for exclusion from final conceptual model
Study design	Randomised controlled trial or cluster‐randomised controlled trial (Categorical, dichotomous)	Study characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Study design	Between subjects or within‐subjects design (Categorical, dichotomous)	Study characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Study/ intervention setting	Laboratory or field setting (Categorical, dichotomous)	Study characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Study/ intervention setting	Selecting/consuming alone or selecting/consuming with others	Study characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Product type	Food or tobacco (or alcohol ‐ no studies) (Categorical, dichotomous)	Study characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Healthiness of manipulated product(s) (food products only)	FSA Nutrient Profile Score (Continuous)	Study characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Basis for calculating healthiness of manipulated product(s) (food products only)	Specific product or product category (Categorical, dichotomous)	Study characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Energy density of manipulated product(s) (food products only)	Energy density points from FSA Nutrient Profile model (Continuous)	Study characteristic	No	Yes	Devitt 2004 (study includes concurrent manipulation of energy density)	Kral 2004a, Fisher 2007b, Leahy 2008, Looney 2011, Rolls 2006b (studies include concurrent manipulation of energy density)	Evidence from previous studies that the energy density of food can exert independent and combined influences on energy intake suggests that this has the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Kral 2004a, Kral 2004b, Rolls 2009, Bell 1998, Rolls 1999, Rolls 2006b	N/A
Target of manipulation	Portion, package, individual unit, package with individual unit, or tableware (Categorical, nominal)	Study characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Type of manipulation	Size (including volume) or shape manipulation (Categorical, dichotomous)	Study characteristic	Yes	Yes	N/A	N/A	N/A	N/A	Post‐hoc decision taken to conduct separate meta‐analyses for size and shape since comparisons of size were not judged conceptually comparable to comparisons of shape among the set of studies included in this review Therefore no longer conceptualised as a potential effect modifier
Manipulation from a standard size	No or yes (Categorical, dichotomous)	Study characteristic	Yes	No	N/A	N/A	N/A	N/A	In practice it was rarely possible to code this variable based on information in study reports, and not judged practicable to code with reference to data from external sources
If applicable, direction of the change relative to standard size	Smaller or larger (Categorical, dichotomous)	Study characteristic	Yes	No	N/A	N/A	N/A	N/A	In practice it was rarely possible to code this variable based on information in study reports, and not judged practicable to code with reference to data from external sources
Selection without purchasing or selection with purchasing	Selection without purchasing or selection with purchasing (Categorical, dichotomous)	Study characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Duration of exposure to the intervention	≤ 1 day or > 1 day (Categorical, dichotomous)	Study characteristic	No	Yes	N/A – Added based on discussion of collected data between 2 review authors (GJH and IS, April 2014), which identified duration of exposure as a variant characteristic of included studies (in addition to timing of outcome measurement, which had been included in our provisional conceptual model)	N/A	Duration of exposure to larger portions, packages, individual units or tableware has the potential to modify any effects of such exposure on the selection and consumption of food	Rolls 2006a, Rolls 2007a	N/A
Relationship between manipulated product(s) and consumption/ selection outcomes (food products only)	The manipulated foods comprise all of those in the study and all are selected or consumed ad libitum (Dummy)	Study characteristic	No	Yes	N/A – Added based on discussion of collected data between 2 review authors (GJH and IS, April 2014), which identified duration of exposure as a variant characteristic of included studies	N/A	This relationship may have the potential to modify any effects of such exposure on the selection and consumption of food. This is because providing any additional foods for consumption beyond those manipulated may result in additional energy consumption in either or both conditions. Given potential ceiling effects on total consumption, this could modify any intervention effect	‐	N/A
Relationship between manipulated product(s) and consumption/ selection outcomes (food products only)	The manipulated foods are only a subset of all the foods in the study and there are other non‐manipulated foods that are compulsory to select or consume (Dummy)	Study characteristic	No	Yes	N/A – Added based on discussion of collected data between 2 review authors (GJH and IS, April 2014), which identified duration of exposure as a variant characteristic of included studies	N/A	This relationship may have the potential to modify any effects of such exposure on the selection and consumption of food. This is because providing compulsory additional foods beyond those manipulated would result in additional energy consumption in both conditions. Given potential ceiling effects on total consumption, this could attenuate any intervention effect	‐	N/A
Relationship between manipulated product(s) and consumption/ selection outcomes (food products only)	The manipulated foods are only a subset of all the foods in the study and there are other non‐manipulated foods in study that are selected or consumed ad libitum (Dummy)	Study characteristic	No	Yes	N/A – Added based on discussion of collected data between 2 review authors (GJH and IS, April 2014), which identified duration of exposure as a variant characteristic of included studies	N/A	This relationship may have the potential to modify any effects of such exposure on the selection and consumption of food. This is because providing additional foods to be consumed ad libitum beyond those manipulated may result in additional energy consumption in either or both conditions. Given potential ceiling effects on total consumption, this could modify any intervention effect	‐	N/A
Relationship between manipulated product(s) and consumption/ selection outcomes (food products only)	Outcome data maps directly onto the manipulated food(s) (as opposed to a wider set of foods, including but not limited to manipulated food(s)) (Dummy)	Study characteristic	No	Yes	N/A – Added based on discussion of collected data between 2 review authors (GJH and IS, April 2014), which identified duration of exposure as a variant characteristic of included studies	N/A	This relationship may have the potential to modify any effects of such exposure on the selection and consumption of food. This is because including any additional foods in outcome measurement beyond those manipulated may result in additional energy consumption being measured in either or both conditions	‐	N/A
Concurrent intervention component(s)	Absent or present (Categorical, dichotomous)	Study characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Socio‐economic status context	Low deprivation or high deprivation (Categorical, dichotomous)	Study characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Magnitude of the absolute difference in size	Difference between larger size and smaller size in grams (Continuous)	Intervention characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Magnitude of the relative difference in size	Larger size expressed as a proportion (%) of smaller size (Continuous)	Intervention characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Age	Average (mean) age in years among study completers (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Gender	Proportion (%) of study completers who were female (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Ethnicity	Proportion (%) of study completers of white ethnicity (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Body mass index (BMI)	Average (mean) BMI among study completers (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Body mass index (BMI)	Average (mean) BMI‐z score among study completers (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Body weight	Average (mean) weight in kilograms among study completers (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Body weight status	Average (mean) percentage (%) body fat among study completers (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Body weight status	Proportion (%) of study completers who were overweight (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Body weight status	Proportion (%) of study completers who were obese (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Body weight status	Proportion (%) of study completers who were overweight or obese (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Behavioural characteristics: dietary restraint	Average (mean) dietary restraint score among study completers ‐ Three Factor Eating Questionnaire (Stunkard 1985) (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Behavioural characteristics: dietary restraint	Average (mean) dietary restraint score among study completers ‐ Dutch Eating Behaviour Questionnaire (Van Strien 1986) (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Behavioural characteristics: dietary restraint	Average (mean) dietary restraint score among study completers ‐ Restraint Scale (Herman 1980) (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Behavioural characteristics: dietary disinhibition	Average (mean) dietary disinhibition score among study completers ‐ Three Factor Eating Questionnaire (Stunkard 1985) (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Behavioural characteristics: dietary disinhibition	Average (mean) dietary disinhibition score among study completers ‐ Dutch Eating Behaviour Questionnaire (Van Strien 1986) (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Behavioural characteristics: external eating	Average (mean) external eating score among study completers ‐ Dutch Eating Behaviour Questionnaire (Van Strien 1986) (Continuous)	Participant characteristic	No	Yes	Hermans 2012	Kelly 2009, Kral 2004a	External eating (which measures the tendency to eat in response to external food‐related cues such as the sight, taste, and smell of attractive food) has the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Herman 2008, Burton 2007, Rodin 1981	N/A
Behavioural characteristics: emotional eating	Average (mean) emotional eating score among study completers ‐ Dutch Eating Behaviour Questionnaire (Van Strien 1986) (Continuous)	Participant characteristic	No	Yes	Kelly 2009	Kral 2004a	Emotional eating (which measures the tendency to eat in response to emotions such as anxiety, disappointment or boredom) has the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Van Strien 1986, Wallis 2009	N/A
Behavioural characteristics: susceptibility to hunger	Average (mean) hunger score among study completers – Three factor eating questionnaire (Stunkard 1985) (Continuous)	Participant characteristic	No	Yes	Flood 2006	Kral 2004a, Rolls 2002, Rolls 2004a, Rolls 2004b, Rolls 2006a, Rolls 2006b, Rolls 2007a, Rolls 2007b (S1), Rolls 2007b (S2), Rolls 2007b (S3), Rolls 2010a (E1), Rolls 2010b (E2)	Susceptibility to hunger (predisposition to feelings of hunger) has the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Provencher 2003, Lindroos 1997	N/A
Behavioural characteristics: plate cleaning tendency	Average (mean) plate cleaning tendency score among study completers ‐ 7‐point agreement scale anchored (‐3) strongly disagree and (+3) strongly agree (Marchiori 2012a, Wansink 2005e) (Continuous)	Participant characteristic	No	Yes	Marchiori 2012a	‐	Plate cleaning tendency (the tendency for a person to consume all the food presented to them) has the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Wansink 2005e	N/A
Behavioural characteristic: plate cleaning tendency	Behavioural characteristic ‐ Proportion (%) of adult study completers who often or always clean the plate (Continuous)	Participant characteristic	No	Yes	Rolls 2004a	‐	Plate cleaning tendency (the tendency for a person to consume all the food presented to them) has the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Wansink 2005e	N/A
Behavioural characteristic: plate cleaning tendency	Behavioural characteristic ‐ Proportion (%) of child study completers who often or always clean the plate (Continuous)	Participant characteristic	No	Yes	Rolls 2004a	‐	Plate cleaning tendency (the tendency for a person to consume all the food presented to them) has the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Wansink 2005e	N/A
Behavioural characteristics: consumption monitoring	Average (mean) consumption monitoring score among study completers ‐ 7‐point agreement scale anchored (‐3) strongly disagree and (+3) strongly agree (Continuous)	Participant characteristic	No	Yes	Marchiori 2012a	‐	Consumption monitoring (the tendency for a person to pay attention to and monitor the food they are consuming) has the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Polivy 1986	N/A
Behavioural characteristics: binge eating	Average (mean) binge eating score among study completers ‐ Eating Disorders Examination (Fairburn 1993) (Continuous)	Participant characteristic	No	Yes	Marchiori 2012a	‐	Binge eating (discrete episodes of eating during which the amount consumed is unusually large and there is a sense of loss of control over eating at the time) has the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Fairburn 1993	N/A
Behavioural characteristics: binge eating	Average (mean) binge eating score among study completers – Binge Eating Questionnaire (Gormally 1982) (Continuous)	Participant characteristic	No	Yes	Stroebele 2009	‐	Binge eating (discrete episodes of eating during which the amount consumed is unusually large and there is a sense of loss of control over eating at the time (Fairburn 1993)) has the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Fairburn 1993, Cooper 2003	N/A
Behavioural characteristics: dieting behaviour	Average (mean) dieting behavior score – Eating Attitude Test (EAT‐26) (Garner 1982) (Continuous)	Participant characteristic	No	Yes	Marchiori 2012a	Rolls 2002, Rolls 2004a, Rolls 2004b, Rolls 2007a	Dieting behaviour (behaviour that involves a person restricting themselves to smaller amounts or specific types of food either to lose weight or for medical reasons) has the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Van Strien 1986, Stunkard 1985	N/A
Behavioural characteristics: mood	Average (mean) mood score among study completers ‐ 7‐point agreement scale anchored (‐3) strongly disagree and (+3) strongly agree (Marchiori 2012a, Reinbach 2010) (Continuous)	Participant characteristic	No	Yes	Marchiori 2012a	‐	Mood has the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Gardner 2014	N/A
Behavioural characteristics: habitual dietary energy intake	Average (mean) dietary energy intake per diem among study completers in kcal (Continuous)	Participant characteristic	No	Yes	Ahn 2010	Ebbeling 2007	Baseline level of dietary energy intake has the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Fyfe 2010, Birch 1991	N/A
Behavioural characteristics: habitual dietary macronutrient intake, Carbohydrate	Average (mean) carbohydrate intake as a proportion (%) of daily energy intake among study completers (Continuous)	Participant characteristic	No	Yes	Ahn 2010	‐	Baseline levels of macronutrient intake have the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Beasley 2009, Monteleone 2003, Yeomans 2001, Rolls 1988	N/A
Behavioural characteristics: habitual dietary macronutrient intake, Protein	Average (mean) protein intake as a proportion (%) of daily energy intake among study completers (Continuous)	Participant characteristic	No	Yes	Ahn 2010	‐	Baseline levels of macronutrient intake have the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Beasley 2009, Rolls 1988	N/A
Behavioural characteristics: habitual dietary macronutrient intake, Fat	Average (mean) fat intake as a proportion (%) of daily energy intake among study completers (Continuous)	Participant characteristic	No	Yes	Ahn 2010	‐	Baseline levels of macronutrient intake have the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Beasley 2009, Brennan 2012, Monteleone 2003, Yeomans 2001, Rolls 1988	N/A
Behavioural characteristics: physical activity	Average (mean) daily total number of steps among study completers (Continuous)	Participant characteristic	No	Yes	Ahn 2010	‐	Baseline levels of physical activity have the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Martins 2007	N/A
Behavioural characteristics: habitual energy expenditure	Average (mean) daily energy expenditure among study completers in kcal (Continuous)	Participant characteristic	No	Yes	Rolls 2006a	Rolls 2006b, Rolls 2007a, Rolls 2007b (S1), Rolls 2007b (S2), Rolls 2007b (S3), Rolls 2010a (E1), Rolls 2010b (E2)	Baseline levels of energy expenditure have the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Martins 2007	N/A
Behavioural characteristics: habitual physical exercise	Average (mean) number of hours of physical exercise completed among study completers per week (Continuous)	Participant characteristic	No	Yes	Marchiori 2012c	‐	Baseline levels of physical exercise have the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Martins 2007	N/A
Biological state: hunger	Average (mean) hunger rating among study completers – 100 mm visual analogue scale (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Biological state: hunger	Average (mean) hunger rating among study completers ‐ 3‐point rating scale (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Biological state: hunger	Average (mean) hunger rating among study completers ‐ 7‐point rating scale (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Biological state: appetitive state, Fullness	Average (mean) fullness rating among study completers – 100 mm visual analogue scale (Continuous)	Participant characteristic	No	Yes	Shah 2011	‐	Baseline levels of feelings of fullness (specific somatic sensation or perceived general state of fullness (Blundell 2010)) have the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Doucet 2008	N/A
Biological state: appetitive state, Satiety	Average (mean) satiety rating among study completers – 100 mm visual analogue scale (Continuous)	Participant characteristic	No	Yes	Shah 2011	‐	Baseline levels of feelings of satiety (specific somatic sensation or perceived general state of being satiated (Blundell 2010)) have the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Lemmens 2011	N/A
Biological state: appetitive state, Prospective consumption	Average (mean) prospective consumption rating among study completers – 100 mm visual analogue scale (Continuous)	Participant characteristic	No	Yes	Shah 2011	‐	Baseline levels of prospective consumption (how much participants felt they could eat now (Shah 2011)) have the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption of food	Doucet 2008	N/A
Other clinical characteristics: depression	Average (mean) depression score among study completers ‐ Zung Depression Inventory (Zung 1986) (Continuous)	Participant characteristic	No	Yes	Rolls 2002	Rolls 2004a, Rolls 2004b, Rolls 2007a	Baseline feelings of depression (or of affective, psychological or somatic symptoms associated with depression) have the potential to modify any effects of larger portions, packages, individual units or tableware on the selection and consumption	Grossniklaus 2010	N/A
Socioeconomic status: occupational status	Proportion (%) of study completers in employment (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Socioeconomic status: occupational status	Proportion (%) of study completers with a parent or caregiver in employment (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Socioeconomic status: education	Average (mean) number of years of education completed among study completers (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Socioeconomic status: education	Proportion (%) of study completers who completed at least some further education (greater than high school, at least some college) (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Socioeconomic status: education	Proportion (%) of study completers with a parent or caregiver who completed at least some further education (greater than high school, at least some college) (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Socioeconomic status: education	Proportion (%) of study completers with a parent or caregiver who completed at least a 4‐year university degree (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Socioeconomic status: income	Proportion (%) of study completers with an individual income > USD 50,000 (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Socioeconomic status: income	Proportion (%) of study completers with a total family income > USD 50,000 (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Other measures of socioeconomic status: food insecurity	Proportion (%) of study completers living in a food insecure household (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Other measures of socioeconomic status: welfare receipt	Proportion (%) of study completers participating in the US National School Lunch Program (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Other measures of socioeconomic status: welfare receipt	Proportion (%) of study completers participating in the US School Nutrition Assistance Program (Continuous)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A
Overall (summary) risk of bias	Low risk, unclear risk or high risk (Categorical, nominal)	Participant characteristic	Yes	Yes	N/A	N/A	N/A	N/A	N/A

Table 1. Record of conceptual model development