Directed acyclic graph (DAG) is an effective means of presenting expert-knowledge assumptions when selecting adjustment variables in epidemiology. DAG is a graph which possesses orientation that runs from one vertex to another vertex. DAG is different from a cyclic graph because, in DAG, there are no cycles formed. DAG is made up of nodes and edges.
The edge which is usually represented by a line is referred to as directed edge, while the nodes which are typically oval in shape include both the leaf node and root node. DAG is also usually referred to as Causal Diagrams. A DAG models several different problems. DAG models are very flexible and expressive. Their flexibility allows for the inclusion of new input functions. DAG is also known as graphs with topological ordering. Their topological order characteristic is as a result of their vertices which are ordered in a sequence and do not form a cycle. Therefore, taking a trace through the nodes ensures that no particular node is visited twice in a given graph.
Causal diagrams have a long history of informal use, more recently have undergone formal development for applications in expert systems and robotics. Summarization of causal links via graph or diagrams has long been used as an informal aid to causal analysis. Causal graph in the form of path analysis and structural equations modeling. In more recent times, the theory of directed acyclic graphs (DAG) has been extended to application in expert-system research.
DAG allows for the ability to checkmate work progress by incorporating milestones on the graph. The use of milestones enables DAG to be utilized for very large projects, irrespective of the number of persons required to perform a task. A DAG is more expressive as compared to its linear model counterparts which may not possess as much detail at first look. DAG helps to analyze results that have inputs that are dependent on each other. When events occur, their relationship with each other can be represented through the use of DAG. It also enhances compact or compressed representation of a set of sequences. A shared path can be made use of by several subsequences, especially when the output to be represented follows same path or direction. Acyclic networks as exemplified in DAG are vital in the design of an electric circuit, especially when dealing with combinational logic blocks. Also, data flow operations make use of same step-wise process associated with acyclic networks.
In epidemiology, adjusted-variable selection could be grouped into background knowledge-based and statistics-based approaches. And DAG has come to be a core tool in the background-knowledge approach. Using DAG, researchers could assume relationships between variables graphically and, based on these assumptions, identify variables to adjust for confounding and biases [1-3].
An association between X and Y for a given empirical association between X and Y could be due to:
1. X causes Y
2. Chance (but it’s not what DAGs are about)
3. X and Y have a common cause (or its correlate)
1. Conditioning on a common effect
“Collider bias” (Selection bias)
5. Measurement/classification error
6. Y causes X (i.e., “reverse-causality” bias)
SOME PRACTICAL GUIDELINES
1) Draw the DAG, including information on all the relevant variables (both documented and undocumented) and, possibly, selection processes
It’s important that the DAG be causal – i.e., it should include all the common causes of any other pair of variables in the DAG
2) Identify (and write) every path between the exposure of interest and the
2a) Classify each of the paths as either causal or non-causal
• All the arrows on a causal path point in the same direction, downstream from the exposure to the outcome.
• The arrows on a non-causal path change direction at least once
2b) Classify each of the paths as either unconditionally open or closed
• The unconditionally open paths have no colliders on them.
• The unconditionally closed paths have at least 1 collider on them.
3) Identify all the minimally sufficient sets (see below)
A sufficient set is a set of variables that, if used for conditioning by any modality, accomplishes both of the goals:
• all the causal paths are open and all the non-causal paths are closed
A minimally sufficient set: a sufficient set containing no proper subset that is also sufficient.
Ideally, the researcher should try to find all the minimally sufficient sets, because
• some variables may be not documented at all
• some variables are documented with error
• variables vary in terms of info/precision costs associated with conditioning on them
• variables vary in terms of economic costs associated with documenting them.
Accordingly, one should try to identify the minimally sufficient sets consisting of variables that are documented with little error for which there are no/little missing data. Ultimately, the goal is to ensure that: 1) all the causal paths are open
2) all the non-causal paths are closed
A tool used for modeling and analyzing DAG is the Dagitty tool. Dagitty facilitates causal models, and so it is made free for use in analyzing causal models. Causal model creation, editing, and analysis are the functions of the Dagitty tool.The use of the Dagity tool is via a web browser. Though the Dagitty may have limited functionality when applied in the mathematical aspect of DAG, it also has limited utilization in the calculation of critical path.
HAMA tool is also used for DAG analysis; this tool proves very helpful especially for graph related problems.TETRAD is another tool used for analyzing DAG but has limited functionality as compared to the Dagitty tool.
When a graph lacks topological ordering, it is no longer considered a directed graph. A DAG which has direction but lack cycles is vital in the analysis of data with a directional pathway. Also, a deadlock is reached when a cycle is about to be formed while tracing the path in a DAG because it is perfect without any cycle.
- Rothman KJ, Greenland S. Modern Epidemiology, 2nd ed. Philadelphia:Lippincott-Raven; 1998.
- Szklo M0, Nieto FJ. Epidemiology. Beyond the Basics. Gaithersburg,MD: Aspen; 2000.
- MacMahon B, Trichopoulos D. Epidemiology. Principles & Methods, 2nd ed. Boston: Little, Brown and Co; 1996.