Traffic Journal ::002:: IPTables Analysis

Libraries

library(gganimate)
library(ggplot2)
library(reshape2)

Git repositories for extra packages reference:

https://github.com/thomasp85/gganimate

Local Sourcing

https://bcable.net/x/Rproj/shared

source("../../shared/load_recurse.R")
source("shared/load_varlog.R")
source("shared/parse_rawsplit.R")

source("shared/cleanup_logs.R")
source("shared/country_code_cleanup.R")
source("shared/geoip.R")
source("shared/heatmap_prep.R")
source("shared/turn_to_animation.R")
source("shared/world_mapper.R")

Config

site_name <- "bcable.net"
source("../../shared/paths.R")

Themes

theme_heatmap <- function(){
    theme_bw() %+replace% theme(
        axis.line = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        legend.margin = margin(0, 0, 0, 0, "cm"),
        legend.spacing = unit(0, "cm"),
        panel.border = element_blank(),
        panel.grid = element_blank(),
        panel.spacing = unit(0, "cm"),
        plot.margin = margin(0.5, 0.5, 0.5, 0.5, "cm"),
        plot.title = element_text(
            face="bold", size=22,
            margin=margin(0.5, 0.5, 0.5, 0.5, "cm"), debug=FALSE
        ),
    )
}
theme_simple <- function(){
    theme_bw() %+replace% theme(
        axis.text.x = element_text(angle=90, size=15)
    )
}

Boilerplate GeoIP Disclaimer

Geolocation based on IP address is not to be taken as entirely accurate as to the source of traffic or attacks conducted. There are many reasons for this, which include (but are not limited to):

Proxies, VPNs, and Tor

Large quantities of traffic, especially attack based traffic, will use a VPN or the Tor network (or some reasonable facsimile), to mask the origin of the traffic. This will in turn change the appearance of the location of origin. Usually, an attacker will also intentionally want the traffic to appear to come from somewhere that has some form of lesser legal jurisdiction, some form of lesser ability to police traffic, or come from a well known source of malicious attacks such as China or Russia.

For instance, the following log entry was generated by myself against my servers while sitting at my desk in the United States, but it gets geolocated as Russia because of how the packet was sent. This sort of masking is trivial to perform, even by a nine year old on a cellphone.

httpd_data[grep("/from/russia/with/logs", httpd_data$Request), c("Request", "Response.Code", "Country.Code")]

##                               Request Response.Code Country.Code
## 1 GET /from/russia/with/logs HTTP/1.1           404           RU

Vulnerable Servers and Botnets

Some locations will have a higher distribution of virtual servers than others, such as Silicon Valley or China. This can lead to larger quantities of vulnerable virtual machines and servers in those regions, and

Government Interference

It is possible that due to address assignment for governmental intelligence purposes or other economic or political reasons a nation could re-allocate address space and forge the identity similarly to a NAT (network address translation). They could also funnel information via VPN technologies for another nation.

Because most of these agreements are made in private, and due to the fact that most geolocation and WHOIS records are based on self-reporting, it is impossible to know the 100% true nature of geographic address assignment.

Weaknesses or errors in MaxMind or rgeolocate package

This geolocation uses the rgeolocate package available in CRAN, and uses the internal country database that is shipped with it. There could be an error in the database shipped, there could be an error in the lookup code, etc. Bugs happen. I have no reason to believe that any false geolocation is being performed by these packages, however.

Final Note

Despite these weaknesses, this doesn't change the fact that looking at this sort of data can be quite fun and interesting, and potentially enlightening. Generalized conclusions should not be made from this data or the maps herein. You have been warned.

Load Syslog Files

messages_records <- load_varlog(path_syslog, "messages")
messages_records <- raw_populate(messages_records)
messages_records <- cleanup_syslog(messages_records)
ipt_data <- cleanup_iptables(messages_records)
messages_records$Raw.Split <- NA
ipt_data$Raw.Split <- NA

Date Min: 2018-10-28 03:09:03
Date Max: 2019-09-22 03:25:01

secure_records <- load_varlog(path_syslog, "secure")
secure_records <- raw_populate(secure_records)
secure_records <- cleanup_syslog(secure_records)
secure_records$Raw.Split <- NA

Date Min: 2018-10-28 04:03:23
Date Max: 2019-09-14 17:14:25

Interesting Logs in “secure”

Checking “POSSIBLE BREAK-IN ATTEMPT!” messages, they all appear to be innocuous enough (usually me logging in successfully 5 seconds later, so a typo in my password or somesuch). However, the following is interesting:

sub(
    "([0-9][0-9]:[0-9][0-9]:[0-9][0-9]) [^ ]+ ", "\\1 [REDACTED] ",
    secure_records$Raw[grepl("Bad protocol", secure_records$Raw)]
)
##  [1] "Dec 12 14:58:28 [REDACTED] sshd[3809]: Bad protocol version identification '\\003' from [IPREDACTED] port 863"    
##  [2] "Dec 12 14:58:28 [REDACTED] sshd[3809]: Bad protocol version identification '\\003' from [IPREDACTED] port 863"    
##  [3] "Dec 12 14:58:28 [REDACTED] sshd[3809]: Bad protocol version identification '\\003' from [IPREDACTED] port 863"    
##  [4] "Feb 8 01:06:05 [REDACTED] sshd[27407]: Bad protocol version identification '\\003' from [IPREDACTED] port 46318"
##  [5] "Feb 8 01:06:09 [REDACTED] sshd[27408]: Bad protocol version identification '\\003' from [IPREDACTED] port 53422"
##  [6] "Feb 12 06:41:24 [REDACTED] sshd[1511]: Bad protocol version identification '\\003' from [IPREDACTED] port 489"  
##  [7] "Feb 12 06:41:24 [REDACTED] sshd[1511]: Bad protocol version identification '\\003' from [IPREDACTED] port 489"  
##  [8] "Feb 12 06:41:24 [REDACTED] sshd[1511]: Bad protocol version identification '\\003' from [IPREDACTED] port 489"  
##  [9] "Mar 4 19:44:40 [REDACTED] sshd[772]: Bad protocol version identification '\\003' from [IPREDACTED] port 156"      
## [10] "Mar 6 04:56:50 [REDACTED] sshd[2936]: Bad protocol version identification '\\003' from [IPREDACTED] port 185"     
## [11] "Mar 10 00:10:25 [REDACTED] sshd[8761]: Bad protocol version identification '\\003' from [IPREDACTED] port 285"    
## [12] "Mar 4 19:44:40 [REDACTED] sshd[772]: Bad protocol version identification '\\003' from [IPREDACTED] port 156"      
## [13] "Mar 6 04:56:50 [REDACTED] sshd[2936]: Bad protocol version identification '\\003' from [IPREDACTED] port 185"     
## [14] "Mar 10 00:10:25 [REDACTED] sshd[8761]: Bad protocol version identification '\\003' from [IPREDACTED] port 285"    
## [15] "Mar 4 19:44:40 [REDACTED] sshd[772]: Bad protocol version identification '\\003' from [IPREDACTED] port 156"      
## [16] "Mar 6 04:56:50 [REDACTED] sshd[2936]: Bad protocol version identification '\\003' from [IPREDACTED] port 185"     
## [17] "Mar 10 00:10:25 [REDACTED] sshd[8761]: Bad protocol version identification '\\003' from [IPREDACTED] port 285"    
## [18] "Aug 20 10:36:43 [REDACTED] sshd[2671]: Bad protocol version identification '\\003' from [IPREDACTED] port 3255" 
## [19] "Aug 20 10:36:43 [REDACTED] sshd[2672]: Bad protocol version identification '\\003' from [IPREDACTED] port 18769"
## [20] "Aug 20 10:36:43 [REDACTED] sshd[2671]: Bad protocol version identification '\\003' from [IPREDACTED] port 3255" 
## [21] "Aug 20 10:36:43 [REDACTED] sshd[2672]: Bad protocol version identification '\\003' from [IPREDACTED] port 18769"
## [22] "Sep 12 11:30:01 [REDACTED] sshd[6444]: Bad protocol version identification '\\003' from [IPREDACTED] port 58895" 
## [23] "Sep 14 17:14:25 [REDACTED] sshd[9926]: Bad protocol version identification '\\003' from [IPREDACTED] port 341"    
## [24] "Sep 12 11:30:01 [REDACTED] sshd[6444]: Bad protocol version identification '\\003' from [IPREDACTED] port 58895" 
## [25] "Sep 14 17:14:25 [REDACTED] sshd[9926]: Bad protocol version identification '\\003' from [IPREDACTED] port 341"

All IPs appear to be hosts from specific hosts from Germany, Russia, and Bulgaria. My message to the Bulgarians: “NODNOL 871 SELIM? Thankski Verski Muchski Budski!”

What was odd is that after looking at the information for the WHOIS on the Bulgarian IP address, the physical address and name is very, extremely specific. It gave a specific apartment number, name, etc, that was easily pulled up on Google Street View. Lots of satellite dishes on the side of the apartment complex! Nice enough city, though. Maybe a slight bit crowded. Very creepy that this can be done today, huh? I'm literally looking at the apartment and surrounding city for someone who likely sent a payload at my server. All of this with PUBLIC tools and PUBLIC information. Technology must be destroyed. This kind of goes to show how sensitive an IP address can be, and why I tend to redact these when publishing things like this (even though he's probably being a naughty boy, I do not know the context of what actually occurred).

This also confirms my suspicions that you should never use your actual IP address and send all traffic through a VPN connection you trust. ALL traffic. And ALL traffic going over that should be over an encrypted means to the destination in case the VPN provider turns out to be sketchy.

WHOIS data can be too specific sometimes. This gets into a weird area with GDPR, too, since the US has sided with this information being public, and the EU siding with masking WHOIS information. Might be an interesting factoid to throw into the debate, but who cares about politics anyway? It's just domesticated primates flinging poo at each other. Facts rarely enter the debate, and when they do ideology destroys their purpose. Only way to keep yourself private is to take your privacy into your own hands and don't create data to begin with if you can help it, or mask it well. Better to treat the internet as a more public place than the out of doors.

I'll probably end up using the Rwhois package I made to dig through these IPs next.

Also, another disclaimer. The physical address discovered could be inaccurate or incomplete. I didn't investigate to see if it was a Tor node and there's no way for me to know if it's a VPN this guy runs for his friends or a private collection of clients, or a variety of other circumstances.

Unrumble.

Build Data Frames

Country Code

ipt_data$Country.Code <- geoip(ipt_data$IP.Source, "country_code")$country_code
ipt_country_df <- country_code_cleanup(ipt_data$Country.Code)
ipt_top20 <- ipt_country_df[ipt_country_df$Count >
    tail(head(sort(ipt_country_df$Count, decreasing=TRUE), n=21), n=1),
]
ipt_data$Date.NoTime <- as.POSIXlt(strftime(ipt_data$Date, format="%Y-%m-%d"))
ipt_data$Count <- rep(1, nrow(ipt_data))
agg_country_time <- aggregate(
    Count ~ Country.Code + as.factor(Date.NoTime),
    data=ipt_data, FUN=sum
)
agg_country_time <- country_code_merge(agg_country_time)
names(agg_country_time) <- c(
    "Country.Code", "Date", "Count", "Latitude", "Longitude", "Country.Name"
)
agg_country_time$Date <- as.POSIXlt(agg_country_time$Date)
agg_country_time_top20 <- agg_country_time[
    agg_country_time$Country.Name %in% unique(ipt_top20$Country),
]

Graphs

IPTables INPUT Table Packet Drops

g <- world_mapper(ipt_country_df)
g <- g + labs(
    title=paste0(
        site_name, ": IPTables: INPUT Table Packet Drops", collapse=""
    ),
    fill="Dropped Packets", x="", y=""
)
g <- g + scale_fill_continuous(low="#300000", high="#E00000", guide="colorbar")
g <- g
g

plot of chunk graph_ipt_geo

IPTables Top 20 Country Barchart

g <- ggplot(ipt_top20, aes(x=Country, y=Count))
g <- g + geom_bar(stat="identity")
g <- g + theme_simple()
g

plot of chunk barchart_country

IPTables Country Timeline

agg_country_time_top20$Date <- as.POSIXct(agg_country_time_top20$Date)
g <- ggplot(agg_country_time_top20, aes(
    x=Date, y=Count, group=Country.Name, colour=Country.Name)
)
g <- g + geom_line() + coord_cartesian(ylim=c(0,10000))
g <- g + theme_simple()
g

plot of chunk timeline_country

Ports

agg_dst_ports <- aggregate(
    Hostname ~ Destination.Port, data=ipt_data, FUN=length
)
names(agg_dst_ports) <- c("Value", "Count")
agg_dst_ports$Value <- as.numeric(as.character(agg_dst_ports$Value))

Non-Ephemeral Tile

non_ephemeral_ports <- heatmap_prep(
    agg_dst_ports[agg_dst_ports$Value < 1024,], 1024, 32
)
names(non_ephemeral_ports) <- c("Destination.Port", "Scale", "X", "Y")
non_ephemeral_graph <- function(data, post_title=""){
    g <- ggplot(data, aes(x=X, y=Y, fill=Scale, label=Destination.Port))
    g <- g + geom_tile() + geom_text()
    g <- g + labs(
        title=paste0(site_name,
            ": IPTables Filtered Non-Ephemeral Destination Ports",
            post_title, collapse=""
        ), x="", y=""
    )
    g <- g + theme_heatmap()
    g <- g + scale_fill_continuous(
        low="#500000", high="#E00000", guide="colorbar"
    )
    g <- g + scale_x_discrete(expand=c(0,0)) + scale_y_discrete(expand=c(0,0))
    g
}
non_ephemeral_graph(non_ephemeral_ports)

plot of chunk graph_non_ephemeral_ports

Truncated at 1000 for visual purposes.

non_ephemeral_ports$Scale[non_ephemeral_ports$Scale > 1000] <- 1000
non_ephemeral_graph(non_ephemeral_ports, " (truncated)")

plot of chunk graph_non_ephemeral_ports_trunc

Filtered Destination Ports Barchart

common_ports <- head(agg_dst_ports[order(-agg_dst_ports$Count),], n=256)
common_ports <- common_ports[order(common_ports$Value),]
common_ports <- heatmap_prep(common_ports)
names(common_ports) <- c("Destination.Port", "Scale", "X", "Y")
common_ports_graph <- function(data, post_title=""){
    g <- ggplot(data, aes(x=X, y=Y, fill=Scale, label=Destination.Port))
    g <- g + geom_tile() + geom_text()
    g <- g + labs(
        title=paste0(site_name,
            ": IPTables Top 256 Commonly Filtered Destination Ports",
            post_title, collapse=""
        ), x="", y=""
    )
    g <- g + theme_heatmap()
    g <- g + scale_fill_continuous(
        low="#500000", high="#E00000", guide="colorbar"
    )
    g <- g + scale_x_discrete(expand=c(0,0)) + scale_y_discrete(expand=c(0,0))
    g
}
g <- ggplot(common_ports, aes(x=as.factor(Destination.Port), y=Scale))
g <- g + geom_bar(stat="identity")
g <- g + labs(
    title=paste0(site_name,
        ": IPTables Filtered Destination Ports Barchart", collapse=""
    ), x="Port Number (0-65535)", y=""
)
g <- g + theme_simple() %+replace% theme(axis.text.x=element_blank())
g

plot of chunk bar_common_ports

Commonly Filtered Destination Ports Tile

common_ports_graph(common_ports)

plot of chunk graph_common_ports

Truncated at 1000 for visual purposes.

common_ports$Scale[common_ports$Scale > 1000] <- 1000
common_ports_graph(common_ports, " (truncated)")

plot of chunk graph_common_ports_trunc

Graphs for Common Ports

Attacks going after/scanning most commonly attacked or used ports.

22: ssh
ipt_country_22 <- country_code_cleanup(
    ipt_data$Country.Code[ipt_data$Destination.Port == 22]
)
g <- world_mapper(ipt_country_22)
g <- g + labs(
    title=paste0(
        site_name, ": IPTables: INPUT Table Packet Drops (Port 22: ssh)",
        collapse=""
    ),
    fill="Dropped Packets", x="", y=""
)
g <- g + scale_fill_continuous(low="#300000", high="#E00000", guide="colorbar")
g <- g
g

plot of chunk graph_ssh_map

23: telnet

Why are people still using telnet. :(

ipt_country_23 <- country_code_cleanup(
    ipt_data$Country.Code[ipt_data$Destination.Port == 23]
)
g <- world_mapper(ipt_country_23)
g <- g + labs(
    title=paste0(
        site_name, ": IPTables: INPUT Table Packet Drops (Port 23: telnet)",
        collapse=""
    ),
    fill="Dropped Packets", x="", y=""
)
g <- g + scale_fill_continuous(low="#300000", high="#E00000", guide="colorbar")
g <- g
g

plot of chunk graph_telnet_map

445: microsoft-ds

Yucky.

ipt_country_445 <- country_code_cleanup(
    ipt_data$Country.Code[ipt_data$Destination.Port == 445]
)
g <- world_mapper(ipt_country_445)
g <- g + labs(
    title=paste0(
        site_name,
        ": IPTables: INPUT Table Packet Drops (Port 445: microsoft-ds)",
        collapse=""
    ),
    fill="Dropped Packets", x="", y=""
)
g <- g + scale_fill_continuous(low="#300000", high="#E00000", guide="colorbar")
g <- g
g

plot of chunk graph_microsoftds_map

Examining Specific Data

Detecting Port Scans

A port scan is detected if any specific IP address attempts to connect to more than 50 unique destination ports. Under normal usage of my resources, zero will occur. One off connections to random ports that aren't being used are cut out of this detection (for instance, incorrect IP address configured somewhere). No resource should be using more than 50 unique destination ports.

Two detection mechanisms are used in this code. One detects on a per-day basis, to see who is spamming the server (such as: nmap -T insane), and on a long-term basis, connecting over multiple days from the same IP but to unique destination ports (such as: nmap -T paranoid).

Aggregation

agg_ip_port_date <- aggregate(
    Destination.Port ~ IP.Source + Country.Code + as.factor(Date.NoTime),
    data=ipt_data, FUN=function(x){ length(unique(x)); }
)
names(agg_ip_port_date) <- c(
    "IP.Source", "Country.Code", "Date", "Count"
)
agg_ip_port_date$Count <- as.numeric(as.character(agg_ip_port_date$Count))
agg_ip_port <- aggregate(
    Destination.Port ~ IP.Source + Country.Code,
    data=ipt_data, FUN=function(x){ length(unique(x)); }
)
names(agg_ip_port) <- c("IP.Source", "Country.Code", "Unique.Ports")
agg_ip_port$Unique.Ports <- as.numeric(as.character(agg_ip_port$Unique.Ports))
agg_unique_ip <- aggregate(
    IP.Source ~ Country.Code,
    data=agg_ip_port_date[agg_ip_port_date$Count > 50,], FUN=length
)
unique_ip_map_insane <- country_code_merge(agg_unique_ip)
names(unique_ip_map_insane) <- c("Country.Code", "Count", "X", "Y", "Country")
agg_unique_ip_paranoid <- aggregate(
    IP.Source ~ Country.Code,
    data=agg_ip_port[agg_ip_port$Unique.Ports > 50,], FUN=length
)
unique_ip_map_paranoid <- country_code_merge(agg_unique_ip_paranoid)
names(unique_ip_map_paranoid) <- c("Country.Code", "Count", "X", "Y", "Country")

Raw Data

nmap -T insane port scans:

nrow(agg_ip_port_date[agg_ip_port_date$Count > 50,])
## [1] 609

nmap -T paranoid port scans:

nrow(agg_ip_port[agg_ip_port$Unique.Ports > 50,])
## [1] 799

Top nmap -T insane scan dates:

agg_ip_port_date$Date[agg_ip_port_date$Count > 3000]
## [1] 2019-02-02 2019-02-03 2019-03-01 2019-03-02 2019-03-03 2019-04-01
## [7] 2019-04-05 2019-04-08
## 330 Levels: 2018-10-28 2018-10-29 2018-10-30 2018-10-31 ... 2019-09-22

Mapping Detected Port Scans

g <- world_mapper(unique_ip_map_insane)
g <- g + labs(
    title=paste0(site_name,
        ": IPTables: Detected Port Scans (`nmap -T insane`-like)",
        collapse=""
    ),
    fill="Unique IPs", x="", y=""
)
g <- g + scale_fill_continuous(low="#300000", high="#E00000", guide="colorbar")
g

plot of chunk map_port_scans

Mapping Long-Term Detected Port Scans

g <- world_mapper(unique_ip_map_paranoid)
g <- g + labs(
    title=paste0(site_name,
        ": IPTables: Detected Port Scans (`nmap -T paranoid`-like)",
        collapse=""
    ),
    fill="Unique IPs", x="", y=""
)
g <- g + scale_fill_continuous(low="#300000", high="#E00000", guide="colorbar")
g

plot of chunk map_port_scans_paranoid

Animations

Data Processing

agg_dst_ports_time <- aggregate(
    Hostname ~ Destination.Port + as.factor(Date.NoTime),
    data=ipt_data, FUN=length
)
names(agg_dst_ports_time) <- c("Value", "Date", "Count")
agg_dst_ports_time$Date <- as.POSIXlt(agg_dst_ports_time$Date)
agg_dst_ports_time$Value <- as.numeric(as.character(agg_dst_ports_time$Value))
agg_dst_ports_time$Count[agg_dst_ports_time$Count > 1000] <- 1000
ipt_map_data <- ipt_data[!is.na(ipt_data$Country.Code),]
anim_geoip <- turn_to_animation(ipt_map_data)
anim_geoip$Count[anim_geoip$Count > 1000] <- 1000
anim_ports <- turn_to_animation(
    agg_dst_ports_time[agg_dst_ports_time$Value < 1024,], "Value", "Count"
)
names(anim_ports) <- c("Animate.Time", "Value", "Count")

anim_ports$Value <- as.numeric(as.character(anim_ports$Value))

anim_ports$Count[is.na(anim_ports$Count)] <- 0
anim_ports <- anim_ports[
    (!is.na(anim_ports$Animate.Time) & !is.na(anim_ports$Value)),
]
names(anim_ports) <- c("Animate.Time", "Destination.Port", "Value")
anim_ports$Animate.Time <- as.POSIXct(anim_ports$Animate.Time)
anim_ports_org <- heatmap_prep(
    anim_ports, 1024, 32,
    date.field="Animate.Time", merge.field="Destination.Port",
    value.ordering=TRUE
)

names(anim_ports_org) <- c(
    "Animate.Time", "Destination.Port", "Scale", "X", "Y"
)

anim_ports_org$Scale <- as.numeric(as.character(anim_ports_org$Scale))
anim_ports_org$Animate.Time <- as.character(strptime(
    anim_ports_org$Animate.Time, format="%Y-%m-%d"
))
common_anim_ports_lbls <- head(
    agg_dst_ports$Value[order(-agg_dst_ports$Count)], n=256
)
common_anim_ports_lbls <- common_anim_ports_lbls[order(common_anim_ports_lbls)]

common_anim_ports <- turn_to_animation(
    agg_dst_ports_time[agg_dst_ports_time$Value %in% common_anim_ports_lbls,],
    "Value", "Count"
)
names(common_anim_ports) <- c("Animate.Time", "Destination.Port", "Value")
common_anim_ports$Animate.Time <- as.POSIXct(common_anim_ports$Animate.Time)
common_anim_ports_org <- heatmap_prep(
    common_anim_ports[
        common_anim_ports$Destination.Port %in% common_anim_ports_lbls,
    ], 256, 16,
    date.field="Animate.Time", merge.field="Destination.Port",
    date.ordering=TRUE, expand.values=common_anim_ports_lbls
)
names(common_anim_ports_org) <- c(
    "Animate.Time", "Destination.Port", "Scale", "X", "Y"
)

Quick Animate Function Wrapper

graph_to_animation <- function(g, x=Inf, y=Inf){
    g <- g + geom_label(
        aes(x=x, y=y, label=Animate.Time),
        vjust="inward", hjust="inward",
        colour="#808080", fill="#FFFFFF", label.size=0
    )
    g <- g + transition_manual(Animate.Time)
    g
}

IPTables INPUT Table Packet Drops

g <- world_mapper(anim_geoip)
g <- g + labs(
    title=paste0(
        site_name, ": IPTables: INPUT Table Packet Drops GeoIP Lookup",
        collapse=""
    ),
    fill="Dropped Packets", x="", y=""
)
g <- g + scale_fill_continuous(low="#300000", high="#E00000", guide="colorbar")
g <- graph_to_animation(g)
options(
    gganimate.fps=5,
    gganimate.nframes=length(levels(as.factor(anim_geoip$Animate.Time)))
)
g