The primary input to most bedr methods is a regions object, which is represented as either an R vector of multiple region strings as illustrated below or a data.frame of regions with three columns: chr, start, and end. The regions object returned by various bedr methods matches the input format; vector or data.frame. Here we briefly summarize a subset of key bedr functionalities. For further details on a range of bedr utilities, please see package’s help and vignettes for detailed examples and workflows.
Sort & merge
This functionality enables sorting of genomic regions in both natural and lexographical order using R, unix, BEDTools and BEDOPS engines. Following examples demonstrate the usage of these engines:
The above code will generate the following outputs of sorted regions:
-
# natural sort (unix)
-
"chr1:10-100" "chr1:101-200"
-
"chr1:200-210" "chr1:211-212"
-
"chr2:10-50" "chr2:40-60"
-
"chr10:50-100" "chr20:1-5"
-
# lexicographical sort (R)
-
"chr1:10-100" "chr1:101-200"
-
"chr1:200-210" "chr1:211-212"
-
"chr10:50-100" "chr2:10-50"
-
"chr2:40-60" "chr20:1-5"
-
# lexicographical sort (bedtools)
-
"chr1:10-100" "chr1:101-200"
-
"chr1:200-210" "chr1:211-212"
-
"chr10:50-100" "chr2:10-50"
-
"chr2:40-60" "chr20:1-5"
-
# lexicographical sort (bedops)
-
"chr1:10-100" "chr1:101-200"
-
"chr1:200-210" "chr1:211-212"
-
"chr10:50-100" "chr2:10-50"
-
"chr2:40-60" "chr20:1-5"
As shown above, various types of sorting results are presented in a similar R data structures regardless of which sorting engine is used (unix, R, bedtools or bedops) and their respective output style. Also, BEDTools and BEDOPS do not support natural sorting, and if method = “natural” is requested with these two engines, bedr automatically defaults to using engine = “unix” of “R” to perform sorting. Note, sorting of large number of regions through R will be slow and may also result in high memory overhead.
Much of the command-line interaction with BEDTools and BEDOPS is performed through temporary files followed by efficient piping/parsing of the output straight into R data structures. This ensures that memory intensive sorting tasks (or any other genomic operations discussed below) are managed by the optimized engines such as (BEDTools or BEDOPS), and therefore memory operations in R are limited to subsequent parsing of output.
In addition to sort operations, bedr also supports identification of overlapping regions which can be collapsed to avoid downstream analytical challenges such as many:many join results (Fig. 2), e.g.
The above code will generate the following output of merged regions:
Sort and merge can be combined into one step given they are generally run as a tandem preprocessing step:
The above code will generate the following vector output of sorted and merged regions:
Join
This functionality enables joining two region-based datasets using intervals as an index or primary key. The output is left outer join with respect to the first regions object (Fig. 2), e.g.
-
regions.a <- bedr.merge.region(
-
regions.b <- bedr.merge.region(
-
regions.c <- bedr.merge.region(
-
bedr.join.region(
-
x = regions.a,
-
y = regions.b
-
)
The above code will generate the following output, containing regions of regions.a in the first column, while any overlapping regions from regions.b are listed in columns 2 to 4 (chr, start, end). Regions in regions.a with no overlap are encoded as: . and -1
|
index
|
V4
|
V5
|
V6
|
1
2
3
4
5
6
|
chr1:10-100
chr1:101-210
chr1:211-212
chr10:50-100
chr2:10-60
chr20:1-5
|
.
chr1
chr1
.
chr2
.
|
-1
111
111
-1
40
-1
|
-1
250
250
-1
60
-1
|
Similarly, another bedr function bedr.join.multiple.region() supports merging of multiple sets of regions (Fig. 2), e.g.
The above code will generate the output data.frame shown below. The table lists all the sub-regions and their presence across the three sets of region objects (regions.a, regions.b, and regions.c) passed to the function. For instance, sub-region chr1:1-10 (column: index) overlaps with 2 region objects (b and c). This presence is shown as comma separated list in ‘names’ column as well as a truth table in the subsequent columns. The number of columns representing the truth table will match the number of region objects passed to the function bedr.join.multiple.region().
index n.overlaps names a b c
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
chr1:1-10
chr1:10-20
chr1:20-100
chr1:100-101
chr1:101-111
chr1:111-210
chr1:210-211
chr1:211-212
chr1:212-240
chr1:240-250
chr1:2000-2010
chr10:50-100
chr10:100-110
chr10:110-150
chr2:1-5
chr2:5-10
chr2:10-20
chr2:20-30
chr2:30-40
chr2:40-60
chr20:1-5
chr20:6-7
chr20:7-10
chr20:10-12
|
2
1
2
1
2
3
2
3
2
1
1
1
1
2
2
1
2
1
2
3
1
1
2
1
|
b,c 0 1 1
a 1 0 0
a,c 1 0 1
c 0 0 1
a,c 1 0 1
a,b,c 1 1 1
b,c 0 1 1
a,b,c 1 1 1
b,c 0 1 1
b 0 1 0
b 0 1 0
a 1 0 0
b 0 1 0
b,c 0 1 1
b,c 0 1 1
c 0 0 1
a,c 1 0 1
a 1 0 0
a,c 1 0 1
a,b,c 1 1 1
a 1 0 0
b 0 1 0
b,c 0 1 1
c 0 0 1
|
Subtract and intersect
The subtract utility identifies regions exclusive to first set of regions, and intersect function identifies sub-regions of first set which overlap with the second set of regions (Fig. 2), e.g.
-
bedr.subtract.region(
-
x = regions.a,
-
y = regions.b
-
)
The above code will generate the following output which lists the sub-regions exclusive to regions.a:
Intersect utility makes use of bed.join.region() and finds regions in the second set which overlap with the regions in the first set. An example is shown in the Results section “Join”. Similarly in.region(x = regions.a, y = regions.b) and its R style convenience operator %in.region% can be used to test (logical) presence of overlapping regions, e.g.
-
in.region(
-
x = regions.a,
-
y = regions.b
-
)
bedr also provides an interface to find overlapping regions using Tabix [7]. This can be done using the following bedr call:
which identifies regions overlapping with COSMIC coding mutations file resulting in the following data.frame (only first six rows are shown below):
|
CHROM
|
POS
|
ID
|
REF
|
ALT
|
QUAL
|
FILTER
|
1
2
3
4
5
6
|
1
1
1
1
1
1
|
69345
69523
69538
69539
69540
69569
|
COSM911918
COSM426644
COSM75742
COSM1343690
COSM1560546
COSM1599955
|
C
G
G
T
G
T
|
A
T
A
C
T
C
|
NA
NA
NA
NA
NA
NA
|
<NA>
<NA>
<NA>
<NA>
<NA>
<NA>
|
INFO
|
1
2
3
4
5
6
|
GENE=OR4F5;STRAND=+;CDS=c.255C>A;AA=p.I85I;CNT=1
GENE=OR4F5;STRAND=+;CDS=c.433G>T;AA=p.G145C;CNT=1
GENE=OR4F5;STRAND=+;CDS=c.448G>A;AA=p.V150M;CNT=1
GENE=OR4F5;STRAND=+;CDS=c.449T>C;AA=p.V150A;CNT=1
GENE=OR4F5;STRAND=+;CDS=c.450G>T;AA=p.V150V;CNT=1
GENE=OR4F5;STRAND=+;CDS=c.479T>C;AA=p.L160P;CNT=2
|
Third-party compatibility
Given that bedr can process regions data as R’s vector as well as data.frame data structure, it is easily transformable to other third-party sequence and region objects. For instance, bedr provides a utility adaptor to convert regions into BED data.frame as shown below:
which can be further converted to a widely compatible GRanges [4] object, as shown below:
The above code will create a GRanges object shown in the output below, which can be further customized/extended with additional annotations such as strand and genomic feature names.
GRanges object with 6 ranges
and 0 metadata columns:
|
|
seqnames
|
ranges
|
strand
|
<Rle>
|
<IRanges>
|
<Rle>
|
[1]
[2]
[3]
[4]
[5]
[6]
|
chr1
chr1
chr1
chr10
chr2
chr20
|
[10, 100]
[101, 210]
[211, 212]
[50, 100]
[10, 60]
[1, 5]
|
*
*
*
*
*
*
|
- - - - - - -
seqinfo: 4 sequences from an
unspecified genome;no seqlengths
|
To perform feature meta-analysis and annotation retrieval/conversion (see example workflow in Additional file 1), bedr facilitates downloads from UCSC [8], COSMIC [9] and HUGO [10] including reference genome annotations, repeat sequences, black lists and disease candidate features. Also, bedr has a fully integrated unit-testing framework allowing users to verify integrity of bedr functions when using customized development or installations.
Visualization
For results of common operations such as intersect, Venn diagrams of overlapping features between 2 to 5 sets of regions (2- to 5-way Venn diagrams) can be generated automatically [11]. The overlap criterion can be defined in a number of ways including unique intervals, gene length or user-specified size as a fraction of sub-region’s length, e.g.
-
bedr.plot.region(
-
input = list(
-
a = regions.a,
-
b = regions.b
-
),
-
feature = "bp",
-
fraction.overlap = 0.1
-
)
The above code will generate a base pair level overlap of sequence objects regions.a and regions.b, and show the results as a Venn diagram highlighting lengths of exclusive and overlapping regions as shown below:

Further, bedr output is ideally suited for alternative complex set visualization tools such as UpSetR [12] and Gviz [13].