The route_localnet parameter does not exist for ipv6.
DNAT to localhost (:: 1) is possible only in the OUTPUT chain.
In the PREROUTING DNAT chain, it is possible to any global address or to the link local address of the same interface
the packet came from.
NFQUEUE works without changes.
## nftables
nftables are fine except one very big problem.
nft requires tons of RAM to load large nf sets (ip lists) with subnets/intervals. Most of the home routers can't afford that.
For example, even a 256 Mb system can't load a 100K ip list. nft process will OOM.
nf sets do not support overlapping intervals and that's why nft process applies very RAM consuming algorithm to merge intervals so they don't overlap.
There're equivalents to iptables for all other functions. Interface and protocol anonymous sets allow not to write multiple similar rules.
Flow offloading is built-in into new linux kernels and nft versions.
nft version `1.0.2` or higher is recommended. But the higher is version the better.
Some techniques can be fully used only with nftables. It's not possible to queue packets after NAT in iptables.
This limits techniques that break NAT.
## When it will not work
* If DNS server returns false responses. ISP can return false IP addresses or not return anything
when blocked domains are queried. If this is the case change DNS to public ones, such as 8.8.8.8 or 1.1.1.1.Sometimes ISP hijacks queries to any DNS server. Dnscrypt or dns-over-tls help.
* If blocking is done by IP.
* If a connection passes through a filter capable of reconstructing a TCP connection, and which
follows all standards. For example, we are routed to squid. Connection goes through the full OS tcpip stack, fragmentation disappears immediately as a means of circumvention. Squid is correct, it will find everything as it should, it is useless to deceive him. BUT. Only small providers can afford using squid, since it is very resource intensive. Large companies usually use DPI, which is designed for much greater bandwidth.
## nfqws
This program is a packet modifier and a NFQUEUE queue handler.
For BSD systems there is dvtws. Its built from the same source and has almost the same parameters (see bsd.eng.md).
--dpi-desync-udplen-increment=<int> ; increase or decrease udp packet length by N bytes (default 2). negative values decrease length.
--dpi-desync-udplen-pattern=<filename>|0xHEX ; udp tail fill pattern
--dpi-desync-start=[n|d|s]N ; apply dpi desync only to packet numbers (n, default), data packet numbers (d), relative sequence (s) greater or equal than N
--dpi-desync-cutoff=[n|d|s]N ; apply dpi desync only to packet numbers (n, default), data packet numbers (d), relative sequence (s) less than N
--hostlist=<filename> ; apply dpi desync only to the listed hosts (one host per line, subdomains auto apply, gzip supported, multiple hostlists allowed)
--hostlist-exclude=<filename> ; do not apply dpi desync to the listed hosts (one host per line, subdomains auto apply, gzip supported, multiple hostlists allowed)
--hostlist-auto=<filename> ; detect DPI blocks and build hostlist automatically
--hostlist-auto-fail-threshold=<int> ; how many failed attempts cause hostname to be added to auto hostlist (default : 3)
--hostlist-auto-fail-time=<int> ; all failed attemps must be within these seconds (default : 60)
--hostlist-auto-retrans-threshold=<int> ; how many request retransmissions cause attempt to fail (default : 3)
--hostlist-auto-debug=<logfile> ; debug auto hostlist positives
--filter-tcp=[~]port1[-port2]|* ; TCP port filter. ~ means negation. setting tcp and not setting udp filter denies udp. comma separated list supported.
--filter-udp=[~]port1[-port2]|* ; UDP port filter. ~ means negation. setting udp and not setting tcp filter denies tcp. comma separated list supported.
Otherwise raw sending SYN,ACK frame will cause error stopping the further processing.
If you realize you don't need the synack mode it's highly suggested to restore drop INVALID rule.
### SYNDATA mode
Normally SYNs come without data payload. If it's present it's ignored by all major OS if TCP fast open (TFO) is not involved, but may not be ignored by DPI.
Original connections with TFO are not touched because otherwise they would be definitely broken.
Without extra parameter payload is 16 zero bytes.
### Virtual Machines
Most of nfqws packet magic does not work from VMs powered by virtualbox and vmware when network is NATed.
Hypervisor forcibly changes ttl and does not forward fake packets.
Set up bridge networking.
### CONNTRACK
nfqws is equipped with minimalistic connection tracking system (conntrack)
It's used if some specific DPI circumvention methods are involved and helps to reassemble multi-packet requests.
Conntrack can track connection phase : SYN,ESTABLISHED,FIN , packet counts in both directions , sequence numbers.
It can be fed with unidirectional or bidirectional packets.
A SYN or SYN,ACK packet creates an entry in the conntrack table.
That's why iptables redirection must start with the first packet although can be cut later using connbytes filter.
First seen UDP packet creates UDP stream. It defines the stream direction. Then all packets with the same
`src_ip,src_port,dst_ip,dst_port` are considered to belong to the same UDP stream. UDP stream exists till inactivity timeout.
A connection is deleted from the table as soon as it's no more required to satisfy nfqws needs or when a timeout happens.
There're 3 timeouts for each connection state. They can be changed in `--ctrack-timeouts` parameter.
`--wssize` changes tcp window size for the server to force it to send split replies.
In order for this to affect all server operating systems, it is necessary to change the window size in each outgoing packet
before sending the message, the answer to which must be split (for example, TLS ClientHello).
That's why conntrack is required to know when to stop applying low window size.
If you do not stop and set the low wssize all the time, the speed will drop catastrophically.
Linux can overcome this using connbytes filter but other OS may not include similar filter.
In http(s) case wssize stops after the first http request or TLS ClientHello.
If you deal with a non-http(s) protocol you need `--wssize-cutoff`. It sets the threshold where wssize stops.
Threshold can be prefixed with 'n' (packet number starting from 1), 'd' (data packet number starting from 1),
's' (relative sequence number - sent by client bytes + 1).
If a http request or TLS ClientHello packet is detected wssize stops immediately ignoring wssize-cutoff option.
If your protocol is prone to long inactivity, you should increase ESTABLISHED phase timeout using `--ctrack-timeouts`.
Default timeout is low - only 5 mins.
Don't forget that nfqws feeds with redirected packets. If you have limited redirection with connbytes
ESTABLISHED entries can remain in the table until dropped by timeout.
To diagnose conntrack state send SIGUSR1 signal to nfqws : `killall -SIGUSR1 nfqws`.
nfqws will dump current conntrack table to stdout.
Typically, in a SYN packet, client sends TCP extension **scaling factor** in addition to window size.
scaling factor is the power of two by which the window size is multiplied : 0=>1, 1=>2, 2=>4, ..., 8=>256, ...
The wssize parameter specifies the scaling factor after a colon.
Scaling factor can only decrease, increase is blocked to prevent the server from exceeding client's window size.
To force a TLS server to fragment ServerHello message to avoid hostname detection on DPI use `--wssize=1:6`
The main rule is to set scale_factor as much as possible so that after recovery the final window size
becomes the possible maximum. If you set `scale_factor` 64:0, it will be very slow.
On the other hand, the server response must not be large enough for the DPI to find what it is looking for.
`--wssize` is not applied in desync profiles with hostlist filter because it works since the connection initiation when it's not yet possible
to extract the host name. But it works with auto hostlist profiles.
`--wssize` may slow down sites and/or increase response time. It's desired to use another methods if possible.
`--dpi-desync-cutoff` allows you to set the threshold at which it stops applying dpi-desync.
Can be prefixed with 'n', 'd', 's' symbol the same way as `--wssize-cutoff`.
Useful with `--dpi-desync-any-protocol=1`.
If the connection falls out of the conntrack and --dpi-desync-cutoff is set, dpi desync will not be applied.
Set conntrack timeouts appropriately.
### Reassemble
nfqws supports reassemble of TLS and QUIC ClientHello.
They can consist of multiple packets if kyber crypto is used (default starting from chromium 124).
Chromium randomizes TLS fingerprint. SNI can be in any packet or in-between.
Stateful DPIs usually reassemble all packets in the request then apply block decision.
If nfqws receives a partial ClientHello it begins reassemble session. Packets are delayed until it's finished.
Then they go through desync using fully reassembled message.
On any error reassemble is cancelled and all delayed packets are sent immediately without desync.
There is special support for all tcp split options for multi segment TLS. Split position is treated
as message-oriented, not packet oriented. For example, if your client sends TLS ClientHello with size 2000
and SNI is at 1700, desync mode is fake,split2, then fake is sent first, then original first segment
and the last splitted segment. 3 segments total.
### UDP support
UDP attacks are limited. Its not possible to fragment UDP on transport level, only on network (ip) level.
Only desync modes `fake`,`hopbyhop`,`destopt`,`ipfrag1` and `ipfrag2` are applicable.
`fake`,`hopbyhop`,`destopt` can be used in combo with `ipfrag2`.
`fake` can be used in combo with `udplen` and `tamper`.
`udplen` increases udp payload size by `--dpi-desync-udplen-increment` bytes. Padding is filled with zeroes by default but can be overriden with a pattern.
This option can resist DPIs that track outgoing UDP packet sizes.
Requires that application protocol does not depend on udp payload size.
QUIC initial packets are recognized. Decryption and hostname extraction is supported so `--hostlist` parameter will work.
Wireguard handshake initiation and DHT packets are also recognized.
For other protocols desync use `--dpi-desync-any-protocol`.
Conntrack supports udp. `--dpi-desync-cutoff` will work. UDP conntrack timeout can be set in the 4th parameter of `--ctrack-timeouts`.
Fake attack is useful only for stateful DPI and useless for stateless dealing with each packet independently.
By default fake payload is 64 zeroes. Can be overriden using `--dpi-desync-fake-unknown-udp`.
### IP fragmentation
Modern network can be very hostile to IP fragmentation. Fragmented packets are often not delivered or refragmented/reassembled on the way.
Frag position is set independently for tcp and udp. By default 24 and 8, must be multiple of 8.
Offset starts from the transport header.
tcp fragments are almost always filtered. It's absolutely not suitable for arbitrary websites.
udp fragments have good chances to survive but not everywhere. It's good to assume success rate on QUIC between 50..75%.
Likely more with your VPS. Sometimes filtered by DDoS protection.
There are important nuances when working with fragments in Linux.
ipv4 : Linux allows to send ipv4 fragments but standard firewall rules in OUTPUT chain can cause raw send to fail.
ipv6 : There's no way for an application to reliably send fragments without defragmentation by conntrack.
Sometimes it works, sometimes system defragments packets.
Looks like kernels <4.16havenosimplewaytosolvethisproblem.Unloadingof`nf_conntrack`module
and its dependency `nf_defrag_ipv6` helps but this severely impacts functionality.
Kernels 4.16+ exclude from defragmentation untracked packets.
See `blockcheck.sh` code for example.
Sometimes it's required to load `ip6table_raw` kernel module with parameter `raw_before_defrag=1`.
In openwrt module parameters are specified after module names separated by space in files located in `/etc/modules.d`.
In traditional linux check whether `iptables-legacy` or `iptables-nft` is used. If legacy create the file
`/etc/modprobe.d/ip6table_raw.conf` with the following content :
```
options ip6table_raw raw_before_defrag=1
```
In some linux distros its possible to change current ip6tables using this command: `update-alternatives --config ip6tables`.
If you want to stay with `nftables-nft` you need to patch and recompile your version.
In `nft.c` find :
```
{
.name = "PREROUTING",
.type = "filter",
.prio = -300, /* NF_IP_PRI_RAW */
.hook = NF_INET_PRE_ROUTING,
},
{
.name = "OUTPUT",
.type = "filter",
.prio = -300, /* NF_IP_PRI_RAW */
.hook = NF_INET_LOCAL_OUT,
},
```
and replace -300 to -450.
It must be done manually, `blockcheck.sh` cannot auto fix this for you.
Or just move to `nftables`. You can create hooks with any priority there.
Looks like there's no way to do ipfrag using iptables for forwarded traffic if NAT is present.
`MASQUERADE` is terminating target, after it `NFQUEUE` does not work.
nfqws sees packets with internal network source address. If fragmented NAT does not process them.
This results in attempt to send packets to internet with internal IP address.
You need to use nftables instead with hook priority 101 or higher.
### multiple strategies
`nfqws` can apply different strategies to different requests. It's done with multiple desync profiles.
Profiles are delimited by the `--new` parameter. First profile is created automatically and does not require `--new`.
Each profile has a filter. By default it's empty and profile matches any packet.
--debug=0|1|2|syslog|@<filename> ; 1 and 2 means log to console and set debug level. for other targets use --debug-level.
--debug-level=0|1|2 ; specify debug level for syslog and @<filename>
--bind-addr=<v4_addr>|<v6_addr>; for v6 link locals append %interface_name : fe80::1%br-lan
--bind-iface4=<interface_name> ; bind to the first ipv4 addr of interface
--bind-iface6=<interface_name> ; bind to the first ipv6 addr of interface
--bind-linklocal=no|unwanted|prefer|force
; no : bind only to global ipv6
; unwanted (default) : prefer global address, then LL
; prefer : prefer LL, then global
; force : LL only
--bind-wait-ifup=<sec> ; wait for interface to appear and up
--bind-wait-ip=<sec> ; after ifup wait for ip address to appear up to N seconds
--bind-wait-ip-linklocal=<sec> ; accept only link locals first N seconds then any
--bind-wait-only ; wait for bind conditions satisfaction then exit. return code 0 if success.
--connect-bind-addr=<v4_addr>|<v6_addr> ; address for outbound connections. for v6 link locals append %%interface_name
--port=<port> ; port number to listen on
--socks ; implement socks4/5 proxy instead of transparent proxy
--local-rcvbuf=<bytes> ; SO_RCVBUF for local legs
--local-sndbuf=<bytes> ; SO_SNDBUF for local legs
--remote-rcvbuf=<bytes> ; SO_RCVBUF for remote legs
--remote-sndbuf=<bytes> ; SO_SNDBUF for remote legs
--nosplice ; do not use splice to transfer data between sockets
--skip-nodelay ; do not set TCP_NODELAY for outgoing connections. incompatible with split.
--local-tcp-user-timeout=<seconds> ; set tcp user timeout for local leg (default : 10, 0 = system default)
--remote-tcp-user-timeout=<seconds> ; set tcp user timeout for remote leg (default : 20, 0 = system default)
--no-resolve ; disable socks5 remote dns
--resolver-threads=<int> ; number of resolver worker threads
--maxconn=<max_connections> ; max number of local legs
--maxfiles=<max_open_files> ; max file descriptors (setrlimit). min requirement is (X*connections+16), where X=6 in tcp proxy mode, X=4 in tampering mode.
; its worth to make a reserve with 1.5 multiplier. by default maxfiles is (X*connections)*1.5+16
--max-orphan-time=<sec> ; if local leg sends something and closes and remote leg is still connecting then cancel connection attempt after N seconds
--hostspell ; exact spelling of "Host" header. must be 4 chars. default is "host"
--hostdot ; add "." after Host: name
--hosttab ; add tab after Host: name
--hostnospace ; remove space after Host:
--hostpad=<bytes> ; add dummy padding headers before Host:
--domcase ; mix domain case after Host: like this : TeSt.cOm
--methodspace ; add extra space after method
--methodeol ; add end-of-line before method
--unixeol ; replace 0D0A to 0A
--tlsrec=sni|sniext ; make 2 TLS records. split at specified logical part. don't split if SNI is not present.
--tlsrec-pos=<pos> ; make 2 TLS records. split at specified pos
--mss=<int> ; set client MSS. forces server to split messages but significantly decreases speed !
--mss-pf=[~]port1[-port2] ; MSS port filter. ~ means negation
--tamper-start=[n]<pos> ; start tampering only from specified outbound stream position. byte pos or block number ('n'). default is 0.
--tamper-cutoff=[n]<pos> ; do not tamper anymore after specified outbound stream position. byte pos or block number ('n'). default is unlimited.
--daemon ; daemonize
--pidfile=<filename> ; write pid to file
--user=<username> ; drop root privs
--uid=uid[:gid] ; drop root privs
```
The manipulation parameters can be combined in any way.
`split-http-req` takes precedence over split-pos for http reqs.
split-pos works by default only on http and TLS ClientHello. use `--split-any-protocol` to act on any packet
tpws can bind to multiple interfaces and IP addresses (up to 32).
Port number is always the same.
Parameters `--bind-iface*` and `--bind-addr` create new bind.
Other parameters `--bind-*` are related to the last bind.
link local ipv6 (`fe80::/8`) mode selection :
```
--bind-iface6 --bind-linklocal=no : first selects private address fc00::/7, then global address
--bind-iface6 --bind-linklocal=unwanted : first selects private address fc00::/7, then global address, then LL
--bind-iface6 --bind-linklocal=prefer : first selects LL, then private address fc00::/7, then global address
--bind-iface6 --bind-linklocal=force : select only LL
```
To bind to all ipv4 specify `--bind-addr "0.0.0.0"`, all ipv6 - `::`.
`--bind-addr=""` - mean bind to all ipv4 and ipv6.
If no binds are specified default bind to all ipv4 and ipv6 addresses is created.
To bind to a specific link local address do : `--bind-iface6=fe80::aaaa:bbbb:cccc:dddd%iface-name`
The `--bind-wait*` parameters can help in situations where you need to get IP from the interface, but it is not there yet, it is not raised
or not configured.
In different systems, ifup events are caught in different ways and do not guarantee that the interface has already received an IP address of a certain type.
In the general case, there is no single mechanism to hang oneself on an event of the type "link local address appeared on the X interface."
To bind to a specific ip when its interface may not be configured yet do : `--bind-addr=192.168.5.3 --bind-wait-ip=20`
It's possible to bind to any nonexistent address in transparent mode but in socks mode address must exist.
In socks proxy mode no additional system privileges are required. Connections to local IPs of the system where tpws runs are prohibited.
tpws supports remote dns resolving (curl : `--socks5-hostname` firefox : `socks_remote_dns=true`) , but does it in blocking mode.
tpws uses async sockets for all activities. Domain names are resolved in multi threaded pool.
Resolving does not freeze other connections. But if there're too many requests resolving delays may increase.
Number of resolver threads is choosen automatically proportinally to `--maxconn` and can be overriden using `--resolver-threads`.
To disable hostname resolve use `--no-resolve` option.
`--disorder` is an additional flag to any split option.
It tries to simulate `--disorder2` option of `nfqws` using standard socket API without the need of additional privileges.
This works fine in Linux and MacOS but unexpectedly in FreeBSD and OpenBSD
(system sends second fragment then the whole packet instead of the first fragment).
`--tlsrec` and `--tlsrec-pos` allow to split TLS ClientHello into 2 TLS records in one TCP segment.
`--tlsrec=sni` splits between 1st and 2nd chars of the hostname. No split occurs if SNI is not present.
`--tlsrec-pos` splits at specified position. If TLS data block size is too small pos=1 is applied.
`--tlsrec` can be combined with `--split-pos` and `--disorder`.
`--tlsrec` breaks significant number of sites. Crypto libraries on end servers usually accept fine modified ClientHello
but middleboxes such as CDNs and ddos guards - not always.
Use of `--tlsrec` without filters is discouraged.
`--mss` sets TCP_MAXSEG socket option. Client sets this value in MSS TCP option in the SYN packet.
Server replies with it's own MSS in SYN,ACK packet. Usually servers lower their packet sizes but they still don't
fit to supplied MSS. The greater MSS client sets the bigger server's packets will be.
If it's enough to split TLS 1.2 ServerHello, it may fool DPI that checks certificate domain name.
This scheme may significantly lower speed. Hostlist filter is possible only in socks mode if client uses remote resolving (firefox `network.proxy.socks_remote_dns`).
`--mss` is not required for TLS1.3. If TLS1.3 is negotiable then MSS make things only worse.
Use only if nothing better is available. Works only in Linux, not BSD or MacOS.
### multiple strategies
`tpws` supports multiple strategies as well. They work mostly like with `nfqws` with minimal differences.
`filter-udp` is absent because `tpws` does not support udp. 0-phase desync methods (`--mss`) can work with hostlist in socks modes with remote hostname resolve.
This is the point where you have to plan profiles carefully. If you use `--mss` and hostlist filters, behaviour can be different depending on remote resolve feature enabled or not.
Use `--mss` both in hostlist profile and profile without hostlist.
Use `curl --socks5` and `curl --socks5-hostname` to issue two kinds of proxy queries.
See `--debug` output to test your setup.
## Ways to get a list of blocked IP
nftables can't work with ipsets. Native nf sets require lots of RAM to load large ip lists with subnets and intervals.
In case you're on a low RAM system and need large lists it may be required to fall back to iptables+ipset.
1. Enter the blocked domains to `ipset/zapret-hosts-user.txt` and run `ipset/get_user.sh`
At the output, you get `ipset/zapret-ip-user.txt` with IP addresses.
2.`ipset/get_reestr_*.sh`. Russian specific
3.`ipset/get_antifilter_*.sh`. Russian specific
4.`ipset/get_config.sh`. This script calls what is written into the GETLIST variable from the config file.
If the variable is not defined, then only lists for ipsets nozapret/nozapret6 are resolved.
So, if you're not russian, the only way for you is to manually add blocked domains.
Or write your own `ipset/get_iran_blocklist.sh` , if you know where to download this one.
On routers, it is not recommended to call these scripts more than once in 2 days to minimize flash memory writes.
donttouch : disable system flow offloading setting if selected mode is incompatible with it, dont touch it otherwise and dont configure selective flow offloading
none : always disable system flow offloading setting and dont configure selective flow offloading
software : always disable system flow offloading setting and configure selective software flow offloading
hardware : always disable system flow offloading setting and configure selective hardware flow offloading
```
`FLOWOFFLOAD=donttouch`
The GETLIST parameter tells the install_easy.sh installer which script to call
to update the list of blocked ip or hosts.
Its called via `get_config.sh` from scheduled tasks (crontab or systemd timer).
Put here the name of the script that you will use to update the lists.
If not, then the parameter should be commented out.
You can individually disable ipv4 or ipv6. If the parameter is commented out or not equal to "1",
use of the protocol is permitted.
```
#DISABLE_IPV4=1
DISABLE_IPV6=1
```
The number of threads for mdig multithreaded DNS resolver (1..100).
The more of them, the faster, but will your DNS server be offended by hammering ?
`MDIG_THREADS=30`
temp directory. Used by ipset/*.sh scripts for large lists processing.
/tmp by default. Can be reassigned if /tmp is tmpfs and RAM is low.
TMPDIR=/opt/zapret/tmp
ipset and nfset options :
```
SET_MAXELEM=262144
IPSET_OPT="hashsize 262144 maxelem 2097152
```
Kernel automatically increases hashsize if ipset is too large for the current hashsize.
This procedure requires internal reallocation and may require additional memory.
On low RAM systems it can cause errors.
Do not use too high hashsize. This way you waste your RAM. And dont use too low hashsize to avoid reallocs.
Its not possible to use nfqws and tpws in transparent proxy mode without root privileges.
Without root tpws can run in --socks mode.
Android has NFQUEUE and nfqws should work.
There's no ipset support unless you run custom kernel. In common case task of bringing up ipset
on android is ranging from "not easy" to "almost impossible", unless you find working kernel
image for your device.
Android does not use /etc/passwd, `tpws --user` won't work. There's replacement.
Use numeric uids in `--uid` option.
Its recommended to use gid 3003 (AID_INET), otherwise tpws will not have inet access.
Example : `--uid 1:3003`
In iptables use : `! --uid-owner 1` instead of `! --uid-owner tpws`.
Nfqws should be executed with `--uid 1`. Otherwise on some devices or firmwares kernel may partially hang. Looks like processes with certain uids can be suspended. With buggy chineese cellular interface driver this can lead to device hang.
Write your own shell script with iptables and tpws, run it using your root manager.
Autorun scripts are here :
magisk : `/data/adb/service.d`
supersu : `/system/su.d`
How to run tpws on root-less android.
You can't write to `/system`, `/data`, can't run from sd card.
Selinux prevents running executables in `/data/local/tmp` from apps.