...

Text file src/edge-infra.dev/hack/datasync/README.md

Documentation: edge-infra.dev/hack/datasync

     1### General Info & Setup
     2This script is used to measure the speed of document processing in CouchDB. It sends a request to the CouchDB scheduler endpoint and retrieves information about documents transferred. The script logs the speed (documents per second) into an output CSV file.
     3
     41. Run `https://gist.githubusercontent.com/ss186222/2f024da3d7f51c0c61b861bb424dfe33/raw/6dc09ed53872b70e93660f88e29684163131810a/cleanup.sh | bash` to clean the CouchDB cluster. Run this inside your edge-infra folder. 
     52. Run `kubectl port-forward -n data-sync-couchdb data-sync-couchdb-0 5984:5984` to establish port forwarding.
     63. Finally, navigate to hack/datasync/ and run `python3 speedTest.py --maxjobs 10000 --churn 500 --interval 10000 --username USERNAME --password PASSWORD --serverURL http://localhost:PORT/` to start the replication speed test. 
     7
     8### Parameters
     9These are parameters that were set in `./edge-infra/config/pallets/edge/datasync/couchdb/generic/couchdb-server.yaml`
    10- `maxjobs`: Maximum number of jobs parameter value.
    11- `churn`: Churn parameter value.
    12- `interval`: Interval parameter value.
    13- `username`: Username for authentication.
    14- `password`: Password for authentication.
    15- `serverURL`: Server URL (optional, default is "http://localhost:5984/").
    16- `filterByDatabase`: Filter by specific database ID (optional).
    17
    18### Output
    19
    20The script appends the following information to the `output.csv` file:
    21
    22- Timestamp
    23- Document ID
    24- Number of documents written
    25- Speed (documents per second)
    26- MaxJobs parameter value
    27- Churn parameter value
    28- Interval parameter value
    29
    30### Data Analysis
    31
    32The `output.csv` can be analyzed in multiple ways. You can run [this script](https://gist.github.com/ss186222/588ea911d45e03e6326b8c000d7db40d) to perform analysis on the output data and identify the fastest churn/interval/maxJobs pair.

View as plain text