I agree it is not that simple, as I have mentioned above: "Mocap data and its mixing require adjustment and sometimes fixing."
Book tip, even super old: MoCap for Artists ('08) by Kitagawa, Midori - Windsor, Brian
(I ignore the frame 0-8 as it is a transition from T-Pose to Capture)
The problem might not be solvable by just applying the transform. Let me explain: The two motion captures are done with two different size dancers. The input (Character Definition1, rename them, hence a one added to the name) has a different size for pretty much any joint. This is why rotation is the only transfer option (except for the root, which contains the position animation). So longer or shorter joints will create different results in the overall length of the leg, for example.
Now one would say let's do the total transfer to the rig, including the positions, which would also set the lengths of each joint. So you have a rigged (Character Definition2) Character and apply precisely this. What will happen is that your character-object shrink and typically leaves you with many crumbled polygons, perhaps. Certainly not what you like to have. So not a solution.
In your case, the source sometimes sticks the toes into the floor, around frame 26, for example, just to point out that Mocap data always needs fixing. That you use the floor as an example leads me to the assumption that the character will get lifted later on, as the toe joints need to sit in the middle of the toe. So for anyone reading along (forum), toe joints are not meant to be on floor level when standing.
The sliding is explained by the character's size, here, the legs, of course. Going that the legs rotate means that the longer they get, the more distance they describe naturally. Since the position animation stays the same, that will not match, so the animation path needs to be scaled accordingly. Also, not something that works out with one adjustment or "clicks".
It gets even worse; some MoCap data has two, three, or four joints as a spine. I have asked many people how to get matching results in the past years. I got no "one size fits all" reply in summary. Image taken from your file, see the difference?
Screen Shot 2023-06-29 at 1.32.36 PM.jpg
Going from my first work with "Bones" back in the mid-'90s, with all our options, it is a comparatively simple task, but yes, it hasn't lost the requirement of some (more or less deep) experience.
Character animation based on MoCap requires fixing it in Post after the capture. Even $100K+ Capture Volumes lead to that work. So, to expect to collect data from different sources, models, and methods and mix them without a problem is not my expectation. I'm also not sure if I would like that give-up control, as MoCap is a recording of expression that needs direction, not a technical conversion.
So, there might be room for improvement, but with everything we use for expressions to tell a story, there is art direction and skills needed.
However, I understand that you see it differently, which is the seed for pushing it forward. Please voice your wishes here ("Share your ideas!"
Thanks for doing that!
All the best