1 Introduction

While endurance running is popular, beneficial to health [1] and has low barriers to entry, some runners do experience barriers to start or maintain participation. Technology could address this problem by monitoring or coaching beginners for a safe and joyful run. As there is a general gender gap in sports participation with less females taking part [2], our research specifically focuses on women.

Respiratory measurement and monitoring is invaluable in many fields, and especially sports, since breathing patterns (BP) are closely correlated to many physiological, psychological, and environmental variables [3]. For example, breathing rate (BR) is a simple metric that is sensitive to rapid changes in physical workload (such as during running), but it is underutilized in sports applications [4]. Many contact-based methods provide practical and accurate measurement of BP during field running. Perhaps the most appropriate field-ready wearable sensors are based on strain or pressure measurement, as they are unobtrusive and relatively robust to motion artifact [5].

Some commercial wearable garments, such as the Hexoskin®(Carre Technologies, Canada), LifeShirt®(VivoMetrics, USA), Tyme Ware\(^{\textrm{TM}}\) smart shirt (Tyme Ware, USA) and QUS Shirt (sanSirro, Austria), can sense respiration and stride data during running in the field with integrated algorithms to detect flow reversal (FR) and step events. Not only can such devices provide accurate estimation of BR, but they can detect the moment of breath onset (also known as FR) [6]. This detection of breath onset may be especially valuable for evaluating synchronization phenomena unique to running, such as locomotor-respiratory-coupling (LRC) [7]. Such synchronized data could be leveraged for biofeedback applications: for example, breathing or LRC or both information can be transformed into an auditory signal and fed back in real time to teach runners awareness and control of physiologic functions [8, 9]. According to learning theories such as instrumental or operant conditioning, the provided auditory signal should have reinforcing properties to elicit the intended breathing behavior during running [10]. By the provided biofeedback, a controlled slower breathing can be induced to trigger positive psychophysiological well-being effects as well as to create an enhanced running experience [11].

Textile resistive and capacitive sensors are often used for gathering kinetic data [12]. Whereas resistive sensors have the advantage of high signal resolutions, capacitive measurement methods are more likely to produce high linearity, low hysteresis and high repeatability [13]. Dinh et al. [14] reviewed stretchable respiration sensors including resistive and capacitive measurement technologies and emphasized capacitive pressure sensors’ advantageous insensitivity to external influences. Nevertheless, textile capacitive sensors can be affected by external influences, i.e., temperature and moisture; thus, their textile integration must be robust to protect the signal quality from distortion [15]. In addition, Daley et al. [7] detailed the impact of anthropometric parameters (such as high adiposity) on lung volume and soft tissue artifact. Since FR detection accuracy is likely highly affected by many parameters, algorithm design should be robust.

There are several early FR detection algorithms reported in the literature that are based on minima and maxima or zero crossing event detections in resting conditions [16, 17]. Recently, Zhong et al. [18] evaluated a textile capacitive respiration sensor during running and reported large motion artifact in the respiration signal that were caused by the running motion; nevertheless, they reported high accuracy of ± 1 bpm deviation of the reference system BR.

Step detection algorithms are often designed for foot or tibia-mounted inertial measurement units (IMU) as these are the closest locations to foot strike itself; subsequently, high accuracies (2 ms–15 ms temporal deviation from the reference system) have been reported [19, 20]. Moreover, the use of waist-worn or smartphone IMUs for step detection is a growing area, especially concerning the development of dead-reckoning algorithms [21]. Additionally, step detection algorithms for chest-mounted IMUs were also investigated and exhibit high event detection accuracy above 95% [22,23,24].

Therefore, we aimed to develop a wearable chest-mounted sensor for stride and respiration sensing during running including customized step and FR detection algorithms. This wearable sensor enables further development towards a concurrent respiratory biofeedback system in running. Within this work, we answer the following research questions:

  • Does the new wearable chest sensor enable accurate flow reversal and step detection during running?

  • How does each of the following parameters affect event detection performance: respiration depth, stride motion artifact, thoracic skin temperature, sweat rate.

2 Methods

2.1 Stride and respiration sensor (SRS)

This work presents a stride and respiration sensor (SRS) similar to that of Bernhart et al. [25]. Figure 1 presents the SRS containing a bespoke smart textile sensor (Grabher Group). This sensor detects respiratory movement of the thorax by measuring pressure between the ribcage and a chest-worn elastic band as capacitive differences. An on-body unit (OBU) prototype was developed by integrating a microcontroller unit (MCU; PSoC\(^{\textrm{TM}}\) 62, Infineon; 2 MB flash, 1 MB SRAM, 1.7\(-\)3.6 V) with a 1. capacitive measurement interface (CAPSENSE\(^{\textrm{TM}}\), Infineon) for respiratory sensing [26] and 2. an IMU (MPU6050, DEBO SENS 3AXIS) into a 3D printed box (material: PLA - 1.75 mm, 3D Prima; printer: Prusa i3 MK3, Prusa Research) that is mounted centrally on the chest band. In addition, the OBU (\(12.5 \times 6 \times 2.5\) cm, 163 g) contains a power bank and an SD card for wireless data logging. The textile sensor was connected to the MCU via conductive yarns and push buttons integrated to the OBU with a distance 16–17 cm apart depending on the length of the chest belt (woven polyamide (PA) elastane (EL) (51% recycled PA, 39% PA, 10% EL), adidas), which was adjusted (ca. 70 to 75 cm (unstretched)) to fit different thoracic girths. The textile sensor was placed at the side of the runner’s body to avoid signal interference from breast movement during running. Conversely, the OBU was mounted centrally on the underbust band to (1) avoid distraction from asymmetric weights and (2) minimize the distance between the MCU and the textile sensor, which contributes to signal interference. The textile sensor was encapsulated in an adhesive elastic flex foil (POWERFLEX ELASTIC, IVB) to protect it from external influences such as sweat. In total, 10 SRS prototypes were developed and used in the study.

Fig. 1
figure 1

The (a) stride and respiration sensor (SRS) containing an elastic strap with an integrated (b) smart textile respiration sensor and a centrally mounted OBU containing the MCU and IMU

2.2 Data collection

2.2.1 Instruments

A smartphone was mounted at the back hip and was running a bespoke research application called Breathtool (no open access), that guides runners to perform LRC at pre-determined ratios (e.g., 4:1 steps per breath) with step rate (SR) adaptive audio. Breathtool was developed based on the concepts of Harbour et al. [9] and was used in this study to stabilize runners’ BR via LRC. In addition, runners were equipped with the SRS sampling at 70 Hz. A Spiroergometry System (COSMED Quark, Italy) [27] was used to acquire reference respiration data at 20 Hz and tibia-mounted IMUs (Physilog®, Gait Up, Switzerland) [28] provided reference stride data at 128 Hz. A wireless climate network sensor (WCN2, Hochschule Kaiserslautern, Germany) was also mounted close to the smart textile sensor at the inner side of the belt to record temperature and relative humidity at a sampling rate of 1 Hz representing thoracic skin temperature and the amount of sweat, respectively. Participants ran on a treadmill (h/p cosmos sports, Germany) in a laboratory to ensure equal conditions.

2.2.2 Study protocol

Before the experiment, participants were briefed on all details of the study and filled in a pre-questionnaire on running experience (see Online Resource 1). Then, the following tasks were performed during the experiment:

  1. 1.

    Signing informed consent

  2. 2.

    Anthropometry assessment by the ISAK (International Society for the Advancement of Kinanthropometry) restricted profile [29] by a level one certified expert

  3. 3.

    Breathtool app familiarization

  4. 4.

    Synchronization (wireless climate network & reference IMUs & SRS)

  5. 5.

    6 min warm-up run at self-selected speed for treadmill and Breathtool sound familiarization during running

  6. 6.

    2 min calm-down and synchronization (reference spiroergometry & SRS)

  7. 7.

    3 min breathing at rest

  8. 8.

    3 min breathing at walking

  9. 9.

    Four 5 min runs in randomized order with the conditions presented in Table 1 with 1 min breaks for sensor calibration

  10. 10.

    2 min calm-down and synchronization (all instruments)

  11. 11.

    Post-questionnaire regarding SRS comfort and Breathtool instruction user experience (see Online Resource 2)

Table 1 details the expected scenarios of the four runs in the study protocol. Two speed conditions around a participant’s self-selected speed were chosen, as it was speculated that they would result in different levels of soft tissue motion artifact. For some less-experienced runners, treadmill speeds were reduced by an additional 10% each to prevent overexertion and ensure adherence to the instructed BR; however, the relative speed difference was kept consistent between intensity levels. Two LRC ratios were prescribed under the assumption that breathing depth would adjust inversely. An auto-calibration function integrated in the MCU was triggered in the breaks between the runs to correct for respiration signal drift related to external influences, e.g., sweat and temperature. Synchronization gestures were performed before and after recording to produce identifiable signal patterns in the data:

  • wireless climate network sensor and reference IMUs and SRS: Three significant strong motions below a hand dryer to synchronize accelerometers as well as temperature and humidity

  • reference spiroergometry and SRS: Three 10 s breath holds (apnea) with a forced exhale to sync respiration [6]

The recording protocol including breaks took at least 40 min, in which the participants ran for 26 min total (including warm up). The SRSs were washed after each recording. The University Ethics board approved this study and written informed consent was obtained from all participants.

Table 1 Prescribed speeds, LRC ratios and expected experimental conditions

2.2.3 Sample

Nineteen female runners (age: 20–27, mean: 24.0, standard deviation: ± 2.29) participated in this study. Table 2 presents the participants’ anthropometric data. All participants individually estimated their breast sizes in reference to their waist girth on the common scale A to F in the sizes A, B and C (15%, 65% and 20% of the participants, respectively). While 20.8% of participants report having no running experience, 62.5% indicate running at most twice a month, and 16.7% declare running at least once a week. Running speeds were set between 5.9 and 11.0 km/h for the -10% (speed: 5.9\(-\)9.0 km/h, mean: 7.2 km/h, standard deviation: 0.9 km/h) and +10% (speed: 7.1\(-\)11.0 km/h, mean: 8.8 km/h, standard deviation: 1.1 km/h) running speed conditions. In total, 76 runs were recorded. Three of these (Run 4 of participants P08, P10 and P18) were excluded due to a SRS breathing sensor signal saturation because of sweat interferences and two runs (run 3 and 4 of participant P09) were excluded because of missing data in the reference IMU on the left tibia. Table 3 presents the total amount of event samples and their distribution over scenarios.

Table 2 Mean and standard deviation (SD) of participants’ anthropometrics: Sum of six skinfolds (6SFSum in mm), body-mass-index (BMI in \(\frac{kg}{m^2}\)), waist-to-hip ratio (WHR), body mass (in kg) and body height (in cm)
Table 3 Sample sizes of all reference events distributed over scenarios and in total

2.3 Data processing

Reference step events were extracted by applying the step detection algorithm of Aubol et al. [20]. Reference inspiration and expiration events were computed by the algorithm presented by Harbour et al. [6]. Since the product of BR and depth (tidal volume; VT) determines ventilation (VE), they are inherently interdependent. It was theorized that the prescribed LRC ratios (and, hence, BR) would result in different breathing depths based on \(V_E=BR \cdot V_T\). To quantify the actual effect of these distinct experimental conditions, breathing depth was extracted from reference spiroergometry VT while motion artifact was quantified by reference IMU spectral density in a stride frequency band between 2.4 and 3.2 Hz. This quantified motion artifact caused by running is termed as stride motion artifact here. In addition, the mean and standard deviations of BR and SR for each condition are computed to calculate the LRC mean absolute percentage error (MAPELRC) to estimate the adherence to the instructed LRC. The MAPELRC was computed by extracting \(LRC_{actual}=\frac{SR_{actual}}{BR_{actual}}\) compared to \(LRC_{prescribed}=\frac{SR_{actual}}{BR_{prescribed}}\) with having \(BR_{prescribed_{2:2}}=\frac{SR_{actual}}{4}\) for an LRC of 2:2 and \(BR_{prescribed_{3:3}}=\frac{SR_{actual}}{6}\) for an LRC of 3:3. A priori limits for acceptable adherence was set to 5% MAPELRC, which conforms to a similar threshold suggested by the literature [5] to sensitively estimate attachment to prescribed breathing rhythms.

The MCU inherits a clock drift over time. Therefore, all recording devices were synchronized to a mutual timeline for further data analysis. Synchronization time stamps were extracted from the raw data of the synchronization gestures for removing clock drift and synchronizing the devices to a common starting timestamp without changing the devices’ sampling rates (FS). A new time scale was calculated for the sensor devices by firstly determining the clock drift (clkdrift) and consequently replacing the time scale of the device:

$$\begin{aligned}{} & {} clk_{drift}=\frac{(sync_{end_{A}}-sync_{start_A})-(sync_{end_{B}}-sync_{start_B})}{num\_samples_B} \end{aligned}$$
(1)
$$\begin{aligned}{} & {} time\_scale_B(i)=\{sync_{start_A} + i \cdot clk_{drift} + \frac{i}{F_S}\} \;\; for \; i \in \textbf{N}_0 \end{aligned}$$
(2)

The aligned datasets were segmented into the four runs, with three minutes per run extracted for statistical analysis. The first 30 s of the five minutes run were excluded to avoid using data when the participants had not yet adhered to the breathing sound instructions. The last 90 s were also excluded because the participants were allowed to select their preferred LRC within this time range for a different study purpose.

2.4 Data analysis

The goal of the evaluation was to understand the validity of event detection, the influence of speed and breathing depth on the performance of the detection and the correlation between algorithm performance and sweat as well as thoracic skin temperature. Therefore, a custom algorithm similar to that of Bernhart et al. [25] was used with the SRS data to detect step, inspiration and expiration events. True positive (tp), false negative (fn), and false positive (fp) events were identified to represent correctly identified events, missed events, and wrongly identified events by the SRS algorithm versus the reference devices, respectively. The events were aligned by a time window with the size of the double signal period time averaged over a run (window size equals \(\frac{2}{SR}\) for step event alignment and \(\frac{2}{BR}\) for FR event alignment) around each reference event and were annotated by following rules:

  • tp: A SRS algorithm detected event exists within the reference event time window.

  • fn: No SRS algorithm detected event exists within the reference event time window.

  • fp: No reference event exists within the SRS algorithm detected event time window.

Duplicate tp events were removed afterwards. Event detection performance was evaluated by assessing F1-Score (F1) from calculated precision (P) and recall (R) between the custom and reference events:

$$\begin{aligned} P=\frac{tp}{tp+fp}; \; \; R=\frac{tp}{tp+fn}; \; \; F_1=2\cdot \frac{P\cdot R}{P+R} \end{aligned}$$
(3)

Precision indicates the proportion of correctly classified events out of the total number of all (correctly and incorrectly) classified events. On the other hand, recall shows the proportion of correctly classified events in the total number of events that could have been detected. The F1-Score represents the harmonic mean of precision and recall. In addition, the estimated BR deviation of the SRS (BRdev in breaths per minute (bpm)) was determined by \(BR_{dev}=(BR_{actual}-BR_{actual} \cdot F_1) \cdot 60\) for each scenario.

The mean of these three performance measures across all runs for each detected event (step, expiration and inspiration), speed condition and LRC ratio were plotted to show in which scenario the algorithm performed best. Mann–Whitney U tests [30, 31] were used to evaluate statistically significant differences in event detection performance between LRC ratios (2:2; 3:3) and running speeds (-10%; +10%). Pearson correlation coefficients were calculated between mean runners’ sweat and thoracic body temperature values per run versus each of the three performance measures. Spearman’s correlation coefficients were used to calculate the correlation between LRC ratio and tidal volume or speed and stride motion artifact.

3 Results

Figures 2 and 3 present the synchronized signals of the SRS and the reference systems of the acceleration and respiration data, respectively. Step events are detected in the valley before a large peak as it presents the foot strike event. FR events are detected at zero crossing events in the reference spiroergometry data and at minima and maxima in the SRS data.

Fig. 2
figure 2

Synchronized acceleration data of the SRS and the reference system of both legs including algorithm detected step events (blue dashed vertical lines) and reference step events (red dashed vertical lines). Dashed lines are overlapping each other for the third and last event

Fig. 3
figure 3

Synchronized respiration data of the SRS and the reference system including algorithm detected (blue dashed vertical lines) and reference (red dashed vertical lines) expiration and inspiration events

Table 4 presents the aggregated tidal volumes, stride motion artifact, BRs, SRs and MAPELRCs between the prescribed LRC and actual LRC for each scenario. In addition, Mann–Whitney U tests were calculated to show whether the values of tidal volume per LRC level or the values of stride motion artifact per speed level were statistically different. The results indicate that the selected LRC ratios (\(p=2.16\times 10^{-11}\)) and speed levels (\(p=8.76\times 10^{-5}\)) were significantly different in consequent tidal volume and stride motion artifact, respectively. The Spearman correlation coefficient between LRC ratio and tidal volume was 0.72 (\(p=1.07\times 10^{-12}\)) and thus were positively correlated as theorized. In addition, the correlation between speed and stride motion artifact was 0.46 (\(p=6.24\times 10^{-05}\)), as initially predicted. Both correlations were statistically significant.

Table 5 shows that F1 was highest for the detection of expiration (97.4%) and inspiration events (97.2%), and was slightly lower for the detection of steps (93.2%). Notably, for each of the three events to be detected, precision was highest for the − 10% and 3:3 scenario. Recall was highest for scenario +10% and 3:3 for detecting expiration and inspiration events, while scenario +10% and 2:2 shows the best results for step detection. The SRS estimated mean overall BR deviates 0.91 (±0.51) bpm over all scenarios. On the event level, precision was higher than recall when analyzing expiration (99.0 vs. \(96.1\%\)) and inspiration (99.0 vs. \(95.7\%\)), while the opposite was true for step detection (95.7 vs. \(99\%\)).

Table 6 presents that all Mann–Whitney U tests resulted in p-values larger than 0.05 which implies that all event detection performance indicators were not significant different between LRC and speed levels and thus, breathing depth and running motion artifact levels, respectively.

Table 7 shows a significant negative moderate correlation between mean sweat level and FR detection performance. In contrast, no significant correlations were found for FR detection and thoracic skin temperature. In addition, there was no observed relationship between thoracic skin temperature or sweat level and step detection performance.

In the questionnaire, 62.5% of the participants reported that they liked the overall comfort of the SRS, while 95.7% felt that the attached electronics had no influence on their breathing. 69.6% of participants reported no distraction to their movement ability, and 91.3% stated that the skin contact of the sensor was comfortable.

Table 4 Mean ± standard deviations of stride motion artifact (SMA in decibel (dB)), mean tidal volume (VT in liter (L) at body temperature and pressure saturated (btps)), step rates (SR in Hz), breathing rates (BR in Hz) and LRC mean absolute percentage error (MAPELRC in %) calculated for each scenario (speed and LRC)
Table 5 Average value of the performance indicators (in %) over all runs for each event and scenario
Table 6 P-values for Mann–Whitney U tests per event and variable for each performance indicator. This table shows whether the performance indicators are statistically different between the speed or LRC level. A p-value smaller than 0.05 indicates that the respective performance indicator has statistically different values between speed or LRC levels
Table 7 Pearson correlation coefficients (PCC) between performance measures and the mean runners’ sweat level and thoracic skin temperature per run

4 Discussion

The SRS results in a BR deviation of 0.91 (±0.51) bpm which is similar to the results reported in the cited literature. Therefore, these results suggest that the SRS is accurate and functionally sound for beginner runners across a range of typical running scenarios. The four experimental conditions chosen resulted in distinct levels of breathing depth and stride motion artifact characteristic of recreational running. The participants were able to follow the instructed breathing sounds with excellent adherence as the MAPELRC of each scenario was below 5% (see Table 4). No difference in event detection performance was observed between different LRC levels or running speeds, suggesting that the SRS can even detect shallow breathing in the presence of strong running motion artifacts.

The precision of the FR detection was, on average, 99% and recall near 96%, which implies that the algorithm identifies few false FR events, and misses relatively more FR events which causes a slightly overestimated BR. The relatively strong low-pass filter used for FR detection may be the reason for this result. Reducing the filter order or increasing the cut-off frequency of the filter would probably compensate for the false positive and false negative rates, but may not increase the accuracy of FR detection. Conversely, the step detection precision was lower than the recall and therefore, more step events were wrongly identified than left out. Compared to a previous study [25], the step detection was worse by 7.2 and 5.1 percentage points (pp) in precision and recall, respectively. This may originate from the increased mass of the OBU which was 2.5 times heavier than a previous version (\(7.1 \times 4.3 \times 2.0\) cm; 65 g). Nevertheless, the FR detection was 7.1 pp higher in precision and 8.9 pp in recall versus the previous version. As the algorithms were applied in the same manner, this performance improvement is likely due to the Capsense measurement method (instead of a capacitive voltage division in the previous version) and a more stable integration strategy of the current SRS implementation that increased the signal quality by avoiding the movement of the sensor itself. In future work, a signal to noise ratio analysis must be conducted to investigate this assumption.

Furthermore, there is suggestive evidence that higher sweat levels negatively affect algorithm FR event detection performance. Nevertheless, the sensor robustness against moisture was improved by a novel sensor encapsulation versus previous SRS versions, as accurate FR detection was still retained after 26 min of exertion for all participants except three (missing the last run because of sweat issues which corresponds to approx. 4% of all runs recorded). Experiments with previous sensor setups reported challenges with sweat contamination after about 20 min or less [25]. As expected, in this investigation there was no apparent relationship between thoracic body temperature and FR detection, and step detection performance was also independent of sweat level and thoracic body temperature.

While FR event detection was sufficiently accurate for all speed and LRC conditions, there was a significant correlation with sweat level. Even though the current sensor’s elastic foil encapsulation is waterproof, the sweat ions cause changes in skin capacitance that contaminate the quality of the respiratory signal. However, this work improved upon previous concepts substantially by modifying the textile sensor integration strategy for enhanced waterproofing and actually no skin contact to the textile sensor is required. For the future, a solution for sensor integration without foil and instead with hydrophobic fabrics is planned to improve SRS comfort and, above all, robustness against perspiration. An OBU miniaturization is also ongoing to reduce the bulk of the electronic components for lower obtrusiveness and to reach an increased step detection performance by decreasing the soft tissue motion artifact.

This algorithm evaluation did not report the event detection time lag because after the uniform clock drift was removed, a non-uniform clock drift was identified in the synchronized data. This non-uniform clock drift was small enough to perform an event-based evaluation with the presented alignment strategy by including an enlarged alignment window. However, the non-uniform clock drift prevents a valid time-lag-based evaluation. In future work, the removal of the non-uniform clock drift will be investigated to enable a time-lag-based algorithm evaluation.

This event detection based analysis of the algorithm is sufficient to evaluate the suitability of the SRS for BR detection because event time lag has negligible effects on BR calculation. Rather, as it is recommended to smooth BR over a minimum of five previous breathing cycles or 10 s [32], systematic or random event detection time lag is unlikely to cause substantial BR estimation error. Nevertheless, FR or step detection lag profoundly affects estimation of LRC phase synchronization with the SRS or any similar sensor, so a detection lag-based evaluation is mandatory to establish validity for such purposes.

In addition, this algorithm evaluation did not consider the anthropometric parameters of the participants. A linear regression model with anthropometric factors is not prudent for statistical comparisons because the event detection performance was near 100% for some participants. However, an anthropometric analysis would be enabled in combination with a time-lag-based evaluation or by increasing the sample size with diverse groups of participants.

In the current analysis, the algorithms were applied in a post-processing scenario; in future work, a real-time algorithm evaluation will be investigated. Real-time applications include additional challenges such as speed/accuracy trade-offs, which could result in a correlation between event detection accuracy and event detection delay. Hence, for LRC phase coupling estimation and feedback provision, short feedback loops are essential. To minimize distraction to the runner, feedback loops shorter than the human reaction time are recommended [33]; these are between 180 and 200 ms and 140 and 160 ms (rarely below 100 ms) for visual and auditory reaction time, respectively [34].

5 Conclusion

This paper presented SRS, a wearable sensor system device for stride and respiration detection during running. An IMU and a capacitive textile sensor were integrated into a wearable chest belt to balance performance, robustness, and comfort. The results suggest that the SRS and associated algorithms are valuable because they provide accurate stride and respiration detection independent of running speed and breathing depth. Future work should improve the SRS robustness without sacrificing performance or comfort, and validate it against a reference device for event detection time lag considering runners’ anthropometrics and LRC relative phase. We speculate that the SRS is likely suitable for LRC real-time feedback during recreational running in the field pending these next developments.