EVOLUTION-MANAGER

Edit File: DEVELOP.md

# Development Guide for rdiff-backup

> Suggest [improvements to this documentation](https://github.com/rdiff-backup/rdiff-backup/issues/new?title=Docs%20feedback:%20/docs/DEVELOP.md)!

## GETTING THE SOURCE

Simply clone the source with:

git clone https://github.com/rdiff-backup/rdiff-backup.git

> **NOTE:** If you plan to provide your own code, you should first fork
our repo and clone your own forked repo (probably using ssh not https).
How is described at
https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/working-with-forks

## GENERAL GUIDELINES

- Before committing to a lot of writing or coding, please file an issue on Github and discuss your plans and gather feedback. Eventually it will be much easier to merge your change request if the idea and design has been agreed upon, and there will be less work for you as a contributor if you implement your idea along the correct lines to begin with.
- Please check out [existing issues](https://github.com/rdiff-backup/rdiff-backup/issues) and [existing merge requests](https://github.com/rdiff-backup/rdiff-backup/pulls) and browse the [git history](https://github.com/rdiff-backup/rdiff-backup/commits/master) to see if somebody already tried to address the thing you have are interested in. It might provide useful insight why the current state is as it is.
- Changes can be submitted using the typical Github workflow: clone this repository, make your changes, test and verify, and submit a Pull Request (PR).
- For all code changes, please remember also to include inline comments and
  update tests where needed.

### License

Rdiff-backup is licensed with GNU General Public License v2.0 or later. By
contributing to this repository you agree that your work is licensed using the
chosen project license.

### Branching model and pull requests

The *master* branch is always kept in a clean state. Anybody can at any time
clone this repository and branch off from *master* and expect test suite to pass
and the code and other contents to be of good quality and a reasonable
foundation for them to continue development on.

Each PR focuses on some topic and resist changing anything else. Keeping the
scope clear also makes it easier to review the pull request. A good pull
request has only one or a few commits, with each commit having a good commit
subject and if needed also a body that explains the change.

Each pull request has only one author, but anybody can give feedback. The
original author should be given time to address the feedback – reviewers should
not do the fixes for the author, but instead let the author keep the authorship.
Things can always be iterated and extended in future commits once the PR has
been merged, or even in parallel if the changes are in different files or at
least on different lines and do not cause merge conflicts if worked on.

It is the responsibility of the PR author to keep it without conflict with
master (e.g. if not quickly merged) and overall to support the review process.

Ideally each pull request gets some feedback within 24 hours from it having been
filed, and is merged within days or a couple of weeks. Each author should
facilitate quick reviews and merges by making clean and neat commits and pull
requests that are quick to review and do not spiral out in long discussions.

If something is of interest for the changelog, prefix the statement in the
commit body with a three uppercase letters and a column; which acronym is
not that important but here is a list of recommended ones (see the release
section to understand why it's important):

* FIX: for a bug fix
* NEW: for a new feature
* CHG: for a change requesting consideration when upgrading
* DOC: for documentation aspects
* WEB: anything regarding the website

#### Merging changes to master

Currently the rdiff-backup Github repository is configured so that merging a
pull request is possible only if it:
- passes the CI testing
- has at least one approving review

While anybody can make forks, pull requests and comment them, only a developer
with write access to the main repository can merge and land commits in the
master branch. To get write access, the person mush exhibit commitment to high
standards and have a track record of meaningful contributions over several
months.

It is the responsibility of the merging developer to make sure that the PR
is _squashed_ and that the squash commit message helps the release process
with the right description and 3-capital-letters prefix (it is still the
obligation of the PR author to provide enough information in their commit
messages).

### Coding style

This project is written in Python, and must follow the official [PEP 8 coding
standard](https://www.python.org/dev/peps/pep-0008/) as enforced via the CI
system.

### Versioning

In versioning we utilize git tags as understood by
[setuptools_scm](https://github.com/pypa/setuptools_scm/#default-versioning-scheme).
Version strings follow the [PEP-440
standard](https://www.python.org/dev/peps/pep-0440/).

The rules are currently as follows (check the `.travis.yml` file for details):

- all commits tagged with an underscore at the end or with a tag looking like a
  version number (i.e. as in next two bullets) are released to
  [GitHub](https://github.com/rdiff-backup/rdiff-backup/releases/).
- all commits tagged with alpha, beta, rc or final format are released to
  [PyPI](https://pypi.org/project/rdiff-backup/#history), i.e. the ones looking
  like: vX.Y.ZaN (alpha), vX.Y.ZbN (beta), vX.Y.ZrcN (release candidate) or
  vX.Y.Z (final).
- all commits where the "version tag" is a development one, i.e. like previously
  with an additional `.devM` at the end, are released to 
  [Test PyPI](https://test.pypi.org/project/rdiff-backup/#history).
  They are meant mostly to test the deployment itself (use alpha versions to
  release development code).

> **NOTE:** the GitHub releases are created as draft, meaning that a maintainer
>	must review them and publish them before they become visible.

> **CAUTION:** due to a bug in Travis CI, the Windows wheel can't currently be
>	published to PyPI and needs to be downloaded from GitHub and manually
>	uploaded to PyPI.

## Releases

There is no prior release schedule – they are made when deemed fit.

## BUILD AND INSTALL

### Pre-requisites

The same pre-requisites as for the installation of rdiff-backup also apply for building:

* Python 3.5 or higher
* librsync 1.0.0 or higher
* pylibacl (optional, to support ACLs)
* pyxattr (optional, to support extended attributes) - even if the xattr library (without py) isn't part of our CI/CD pipeline, feel free to use it for your development
* python3-setuptools (for a proper version instead of DEV)

Additionally are following pre-requisites needed:

* python3-dev (or -devel)
* librsync-dev (or -devel)
* a C compiler (gcc)
* python3-setuptools (for setup.py)
* setuptools-scm (also for setup.py, to gather all source files in sdist)
* libacl-devel (for sys/acl.h)
* tox (for testing)
* rdiff (for testing)
* flake8 (optional, but helpful to validate code correctness locally)
* coverage (optional, but helpful to validate test coverage locally)

All of those should come packaged with your system or available from
https://pypi.org/ but if you need them otherwise, here are some sources:

* Python - https://www.python.org/
* Librsync - http://librsync.sourceforge.net/
* Pywin32 - https://github.com/mhammond/pywin32
* Pylibacl - http://pylibacl.sourceforge.net/
* Pyxattr - http://pyxattr.sourceforge.net/

### Build and install using Makefile

The project has a [Makefile](../Makefile) that defines steps like `all`,
`build`, `test` and others. You can view the contents to see what it exactly
does. Using the `Makefile` is the easiest way to quickly build and test the
source code.

By default the `Makefile` runs all of it's command in a clean Docker container,
thus making sure all the build dependencies are correctly defined and also
protecting the host system from having to install them.

The [Travis-CI](https://travis-ci.org/rdiff-backup/rdiff-backup) integration
also uses the `Makefile`, so if all commands in the `Makefile` succeed locally,
the CI is most likely to pass as well.

### Build and install with setup.py

To install, simply run:

python3 setup.py install

The build process can be also be run separately:

python3 setup.py build

The setup script expects to find librsync headers and libraries in the
default location, usually /usr/include and /usr/lib.  If you want the
setup script to check different locations, use the --librsync-dir
switch or the LIBRSYNC_DIR environment variable.  For instance to instruct
the setup program to look in `/usr/local/include` and `/usr/local/lib`
for the librsync files run:

python3 setup.py --librsync-dir=/usr/local build

Finally, the `--lflags` and `--libs` options, and the `LFLAGS` and `LIBS`
environment variables are also recognized.  Running setup.py with no
arguments will display some help. Additional help is displayed by the
command:

python3 setup.py install --help

More information about using setup.py and how rdiff-backup is installed
is available from the Python guide, Installing Python Modules for System
Administrators, located at https://docs.python.org/3/install/index.html

> **NOTE:** There is no uninstall command provided by the Python
distutils/setuptools system. One strategy is to use the `python3 setup.py
install --record <file>` option to save a list of the files installed to `<file>`,
another is to created a wheel package with `python3 setup.py bdist_wheel`,
as it can be installed and deinstalled.

> **NOTE:** if you plan to use `./setup.py bdist_rpm` to create an RPM, you
> would need rpm-build but be aware that it will currently fail due to a [known
> bug in setuptools with compressed man
> pages](https://github.com/pypa/setuptools/issues/1277).

To build from source on Windows, check the [Windows tools](../tools/windows)
to build a single executable file which contains Python, librsync, and
all required modules.

## TESTING

Clone, unpack and prepare the testfiles by calling the script `tools/setup-testfiles.sh`
from the cloned source Git repo. You will most probably be asked for your password
so that sudo can extract and prepare the testfiles (else the tests will fail).

That's it, you can now run the tests:

* run `tox` to use the default `tox.ini`
* or `tox -c tox_slow.ini` for long tests
* or `sudo tox -c tox_root.ini` for the few tests needing root rights

For more details on testing, see the `test` sections in the [Makefile](../Makefile)
and the [.travis-ci.yml definitions](../.travis-ci.yml).

## DEBUGGING

### Trace back a coredump

At the time of writing these notes, there was an issue where calling the program
generates a `Segmentation fault (core dumped)`. This chapter is based on this
experience debugging under Fedora 29.

References:

* https://ask.fedoraproject.org/en/question/98776/where-is-core-dump-located/
* Adventures in Python core dumping: https://gist.github.com/toolness/d56c1aab317377d5d17a
* Debugging dynamically loaded extensions: https://www.oreilly.com/library/view/python-cookbook/0596001673/ch16s08.html
* Debugging Memory Problems: https://www.oreilly.com/library/view/python-cookbook/0596001673/ch16s09.html

> **NOTE:** This assumes gdb was already installed.

1. First install:

sudo dnf install python3-debug
		sudo dnf debuginfo-install python3-debug-3.7.3-1.fc29.x86_64
		sudo dnf debuginfo-install bzip2-libs-1.0.6-28.fc29.x86_64 glibc-2.28-27.fc29.x86_64 \
			librsync-1.0.0-8.fc29.x86_64 libxcrypt-4.4.4-2.fc29.x86_64 \
			openssl-libs-1.1.1b-3.fc29.x86_64 popt-1.16-15.fc29.x86_64 \
			sssd-client-2.1.0-2.fc29.x86_64 xz-libs-5.2.4-3.fc29.x86_64 zlib-1.2.11-14.fc29.x86_64

2. Then run:

python3 ./setup.py clean --all
		python3-debug ./setup.py clean --all
		CFLAGS='-Wall -O0 -g' python3-debug ./setup.py build
		PATH=$PWD/build/scripts-3.7:$PATH PYTHONPATH=$PWD/build/lib.linux-x86_64-3.7-pydebug/ rdiff-backup -v 10 \
			/some/dir1 /some/dir2
		[...]
		Segmentation fault (core dumped)

> **NOTE:** The CFLAGS avoids optimizations making debugging too complicated

At this stage `coredumpctl list` shows that coredump is the last one, so that
one can call `coredumpctl gdb`, which itself tells (in multiple steps) that we
missing some more debug information, hence the above `debuginfo-install`
statements (assuming guess you could install the packages without version
information if you're sure they fit the installed package versions).

So now back into `coredumpctl gdb`, with some commands:

help
	help stack
	backtrace
	bt full
	py-bt
	frame <FrameNumber>
	p <SomeVar>

1. get a backtrace of all function calls leading to the coredump (also `bt`)
2. backtrace with local vars
3. py-bt is the Python version of backtrace
4. jump between frames as listed by bt using their `#FrameNumber`
5. print some variable/expression in the context of the selected frame

Jumping between frames and printing the different variables, we can recognize that:

1. the core dump is due to a seek on a null file pointer
2. that the file pointer comes from the job pointer handed over to the function rs_job_iter
3. the job pointer itself comes from the self variable handed over to `_librsync_patchmaker_cycle`
4. reading through the https://librsync.github.io/rdiff.html[librsync documentation], it appears that the job type is opaque, i.e. I can't directly influence and it has been created via the `rs_patch_begin` function within the function `_librsync_new_patchmaker` in `rdiff_backup/_librsyncmodule.c`.

At this stage, it seems that the core file has given most of its secrets and we need to debug the live program:

$ PYTHONTRACEMALLOC=1 PATH=$PWD/build/scripts-3.7:$PATH PYTHONPATH=$PWD/build/lib.linux-x86_64-3.7-pydebug/ gdb python3-debug
	(gdb) break rdiff_backup/_librsyncmodule.c:_librsync_new_patchmaker
	(gdb) run build/scripts-3.7/rdiff-backup /some/source/dir /some/target/dir

The debugger runs until the breakpoint is reached, after which a succession of `next` and `print <SomeVar>` allows me to analyze the code step by step, and to come to the conclusion that `cfile = fdopen(python_fd, ...` is somehow wrong as it creates a null file pointer whereas `python_fd` looks like a valid file descriptor (an integer equal to 5).

### ResourceWarning unclosed file

If you get something looking like a `ResourceWarning: Enable tracemalloc to get the object allocation traceback`

PYTHONTRACEMALLOC=1 PATH=$PWD/build/scripts-3.7:$PATH \
	PYTHONPATH=$PWD/build/lib.linux-x86_64-3.7-pydebug/ \
		rdiff-backup -v 10 /tmp/äłtèr /var/tmp/rdiff

This tells you indeed where the file was opened: `Object allocated at (most recent call last)` but it still requires deeper analysis to understand the reason.

> **Reference:** https://docs.python.org/3/library/tracemalloc.html

### Debug client / server mode

In order to make sure the debug messages are properly sorted, you need to have the verbosity
level 9 set-up, mix stdout and stderr, and then use the date/time output to properly sort
the lines coming both from server and client, while making sure that lines belonging together
stay together. The result command line might look as follows:

rdiff-backup -v9 localhost::/sourcedir /backupdir 2>&1 | awk \
		'/^2019-09-16/ { if (line) print line; line = $0 } ! /^2019-09-16/ { line = line " ## " $0 }' \
		| sort | sed 's/ ## /\n/g'

### Debug iterators

When debugging, the fact that rdiff-backup uses a lot of iterators makes it
rather complex to understand what's happening. It would sometimes make it
easier to have a list to study at once of iterating painfully through each
_but_ if you simply use `p list(some_iter_var)`, you basically run through
the iterator and it's lost for the program, which can only fail.

The solution is to use `itertools.tee`, create a copy of the iterator and
print the copy, e.g.:

```
(Pdb) import itertools
(Pdb) inc_pair_iter,mycopy = itertools.tee(inc_pair_iter)
(Pdb) p list(map(lambda x: [str(x[0]),list(map(str,x[1]))], mycopy))
[... whatever output ...]
```

Assuming the iteration has no side effects, the initial variable `inc_pair_iter`
is still valid for the rest of the program, whereas the `mycopy` is "dried out"
(but you can repeat the `tee` operation as often as you want).

### Profile rdiff-backup

#### Profiling without code changes

After having called `./setup.py build`, you may call something like the following
to profile the current code (adapt to your Python version):

```
PATH=$PWD/build/scripts-3.8:$PATH PYTHONPATH=$PWD/build/lib.linux-x86_64-3.8 \
	python -m cProfile -s tottime \
	build/scripts-3.8/rdiff-backup [... rdiff-backup parameters ...]
```

The `-s tottime` option _sorts_ by total time spent in the function.
More information can be found in the
[profile documentation](https://docs.python.org/3/library/profile.html).

> **TIP:** if you're into graphical tools and overviews, have a look e.g.
	at  https://pythonhosted.org//ProfileEye/ ?

You may also do memory profiling using the
[memory-profiler](https://pypi.org/project/memory-profiler/),
though more detailed information requires changes to the code by adding
the `@profile` decorator to functions:

```
pip install --user memory-profiler
PATH=$PWD/build/scripts-3.8:$PATH PYTHONPATH=$PWD/build/lib.linux-x86_64-3.8 \
	mprof run \
	build/scripts-3.8/rdiff-backup [... rdiff-backup parameters ...]
mprof plot
mprof clean
```

> **NOTE:** sometimes calling rdiff-backup this way fails, it's due to the
	script having a wrong interpreter (because of wheel building).
	Call `./setup.sh clean --all && ./setup.py build` to fix it.

> **TIP:** there is also a
	[line-profiler](https://pypi.org/project/line-profiler/),
	but I didn't try it because it requires changes to the code
	(again the `@profile` decorator).

#### More profiling with code changes

Once you have found by profiling an object that uses a lot of memory,
one can use `print(sys.getsizeof(x))` to print it's memory footprint then iterating for a code solution to bring it down.

Memory can be freed manually with:

```
import gc
collected_objects = gc.collect()
```

This can also be run in Python:

```
import cProfile, pstats, StringIO 
pr = cProfile.Profile() 
pr.enable() 
# ... do something ... pr.disable() 
s = StringIO.StringIO()
ps = pstats.Stats(pr, stream=s).sort_stats(‘cumulative’) 
ps.print_stats() 
print s.getvalue()
```

## RELEASING

We use [Travis CI](https://travis-ci.org) to release automatically, as setup in the [Travis configuration file](../.travis.yml).

The following rules apply:

* each modification to master happens through a Pull Request (PR) which triggers
  a pipeline job, which must be succesful for the merge to have a chance to
  happen. Such PR jobs will _not_ trigger a release.
* GitHub releases are generated as draft only on Git tags looking like a release.
  The release manager reviews then the draft release, names and describes it
  before they makes it visible. An automated Pypi release is foreseen but not
  yet implemented.
* If you need to trigger a job for test purposes (e.g. because you changed
  something to the pipeline), create a branch or a tag with an underscore at
  the end of their name. Just make sure that you remove such tags, and
  potential draft releases, after usage.
* If you want, again for test purposes, to trigger a PyPI deployment towards
  test.pypi.org, tag the commit before you push it with a development release
  tag, like `vA.B.CbD.devN`, then explicitly push the tag and the branch at
  the same time e.g. with `git push origin vA.B.CbD.devN myname-mybranch`.

> **TIP:** Travis will not trigger again on a commit which has already gone
           through the pipeline, even if you add a tag. This applies especially
           to PR commits merged to master without squashing.

Given the above rules, a release cycle looks roughly as follows:

1. Call `./tools/get_changelog_since.sh PREVIOUSTAG` to get a list of changes
   (see above) since the last release and a sorted and unique list of authors,
   on which basis you can extend the [CHANGELOG](../CHANGELOG.md) for the
   new release.
   **IMPORTANT:** make sure that the PR is squashed or you won't be able to
   trigger the release pipeline via a tag on master.
2. Make sure you have the latest master commits with
   `git checkout master && git pull --prune`.
3. Tag the last commit with `git tag vX.Y.ZbN` (beta) or `git tag vX.y.Z" (stable).
4. Push the tag to GitHub with `git push --tags`.
5. You won't see anything in GitHub at first and need to go directly to
   [Travis builds](https://travis-ci.org/rdiff-backup/rdiff-backup/builds) to
   verify that the pipeline has started.
6. If everything goes well, you should see the
   [new draft release](https://github.com/rdiff-backup/rdiff-backup/releases)
   with all assets (aka packages) attached to it after all jobs have finished
   in Travis.
7. Give the release a title and description and save it to make it visible to
   everybody.
8. You'll get a notification e-mail telling you that rdiff-backup-admin has
   released a new version.
9. Use this e-mail to inform the [rdiff-backup users](rdiff-backup-users@nongnu.org).

> **IMPORTANT:** if not everything goes well, remove the tag both locally with
                 `git tag -d TAG` and remotely with `git push -d origin TAG`.
                 Then fix the issue with a new PR and start from the beginning.

> **TIP:** the PyPI deploy pipeline is for now broken under Windows on Travis-CI.
           You may download the Windows wheel(s) from GitHub and upload them to
           PyPI from the command line using twine:
           `twine upload [--repository-url https://test.pypi.org/legacy/] dist/rdiff\_backup-*-win32.whl`

The following sub-chapters list some learnings and specifities in case you need to modify the pipeline.

### Install the Travis client locally

See https://github.com/travis-ci/travis.rb for details, here only the gist of it:

```
ruby -v               # version >= 2
dnf install rubygems  # or zipper, apt, yum...
gem install travis    # as non-root keeps everybody more happy
travis version        # 1.8.10 -> all OK
```

> **NOTE:** installing travis gem also pulls the dependencies multipart-post, faraday, faraday_middleware, highline, backports, net-http-pipeline, net-http-persistent, addressable, multi_json, gh, launchy, ethon, typhoeus, websocket, pusher-client. You might want to install some of them via your preferred package manager instead.

### Create an OAuth key

Use the travis client to generate a secure API key (you can throw away other changes to the `.travis.yml` file). You will need the password of the rdiff-backup-admin, hence only project admins can generate it:

```
$ travis setup releases
Detected repository as rdiff-backup/rdiff-backup, is this correct? |yes| 
Username: rdiff-backup-admin
Password for rdiff-backup-admin: ********************
File to Upload: dist/*
Deploy only from rdiff-backup/rdiff-backup? |yes| 
Encrypt API key? |yes| 
```

The key to add looks then as follows for GitHub deployment (the concrete key shown here isn't valid though):

```
deploy:
  provider: releases
  api_key:
    secure: lqg+HZoy68WudiogbEnOmhxfw9zEJhPOyM4bLJdU2lRBlUZbf0uFvpVJdJqPB7rovKpDknapg4xdXdpbLbD0r/PwsSI9UyFLmyhGn24pnSlrFFjFm2AIQQJUMiCcqsPqNc7fXNMC1BwuM1/RjO3hIxfPxI+A9MSVqW3qhzmerOKXeKFiOLXJ0FkTomRdWGhCEafWO1Ibz5O2d5psK1N/r1ni8kv+E6GPjHk54vmKNcFg8uB7+cPs7ONtW2F+M/h12UVZkC+hy8Bss+esQIMYdVLW5JkKSFfNwKs57qDYYd0lWLzMRti+S+0k/1O6l51BzLY61C4FlRwrMWAy4HIYn5ui39GXIYtGXq9zW+EpYvqTsar+KDU+DGzsr+hAt+eCQpbmZ2SpA7B8Mb3x+BwAcEkvCql789FhWCOd3arUm3H6Ng6yNt50crafJeboHhmitgFQ9uTM7AnXwMnIYVkl6IAZlPkIj20TF1JSdmzpPG2jEJATsMybCuaAuS+ngq4DnJ1axGcclIr4AY9RkSI8EVrL1HTcVLaIH0JnWdO/YC7DSZloC0oswbch1qaW3WsWkJspeaLRvochyFYsatAbvZ46Mzt5uuJUPtSNUVizeb7kBhVGzLVYIepd5XYPgc3Qxp23hu2k9lwg4vjq8WFegC5a34SW/zEZeuFP3HTnD+4=
```

### Delete draft releases

Because there is one draft release created for each pipeline job, it can be quite a lot when one tests the release pipeline. The GitHub WebUI requires quite a lot of clicks to delete them. A way to simplify (a bit) the deletion is to install the command line tool `hub` and call the following command:

```
hub release --include-drafts -f '%U %S %cr%n' | \
	awk '$2 == "draft" && $4 == "days" && $3 > 2 {print $1}' | xargs firefox
```

the `2` compared to `$3` is the number of days, so that you get one tab opened in firefox for each draft release, so that you only need 2 clicks and one Ctrl+W (close the tab) to delete those releases.

> **NOTE:** deletion directly using hub isn't possible as it only supports tags and not release IDs. Drafts do NOT have tags...