I do historical swordfighting and noticed AI struggles to track it. I’m building an open dataset to help fix this. Does my schema make sense? [P]
Hi everyone,
I’m a historical swordfighter (HEMA practitioner), and while I’m not a computer vision engineer or a roboticist, I’ve been reading a lot about the current bottlenecks in embodied AI, specifically around the Sim2Real gap and thin-object tracking.
It occurred to me that high-level swordfighting is basically a perfect nightmare scenario for computer vision. We move at maximum athletic output, we shift our weight rapidly in non-linear ways (great for bipedal balance testing), we are completely covered in thick, bulky black jackets that hide our joints, and our steel blades move at 80mph, dropping below sub-pixel resolution or causing massive motion blur.
I think it would be cool to have a computer vision scoring system for tournaments so I'm working to put together a mini-dataset using a synchronized multi-view setup (120/240fps) to map 100 hyper-trimmed clips of these specific physics edge cases.
Since I'm non-technical, I used some AI assistance to help me structure what an AI-ready dataset card should look like, and I've hosted the placeholder page on Hugging Face to test the schema before I start shooting video with my clubmates.
Here is the JSON line structure I'm currently planning to annotate each video with:
{ "clip_id": "hema_ls_001", "meta": { "weapon": "Longsword", "source_text": "Joachim Meyer (1570)", "capture_fps": 120 }, "time_stamps": { "start_frame": 120, "blade_contact_frame": 165, "recovery_end_frame": 210 }, "biomechanics": { "initial_guard": "Right Vom Tag", "ending_guard": "Left Ochs", "footwork_type": "Passing step offline", "strike_trajectory": "Diagonal Oberhau", "edge_alignment": "True edge" }, "computer_vision_hazards": { "occlusion_rating": "High (Crossed arms, bulky torso jacket)", "motion_blur_expected": true }, "frame_annotations": [ { "frame_index": 165, "is_contact_event": true, "keypoints_2d_pixel_coordinates": { "fencer_a_right_wrist": [412.5, 780.2], "fencer_a_left_wrist": [430.1, 795.4], "fencer_a_head_center": [425.0, 510.8], "fencer_b_right_wrist": [580.4, 765.1], "fencer_b_left_wrist": [565.0, 750.3], "sword_a_guard": [455.0, 810.0], "sword_a_tip": [890.4, 320.1], "sword_b_guard": [540.2, 790.6], "sword_b_tip": [310.5, 450.2] }, "segmentation_masks": { "sword_a_polygon_points": [[455.0, 810.0], [460.1, 805.2], [888.2, 322.5], [890.4, 320.1], [455.0, 810.0]], "occluded_pixels_detected": true } } ] } My questions for the researchers here:
- Does this metadata structure actually give you what you need to test trajectory prediction or pose estimation?
- Are there any specific keypoints (like explicit crossguard coordinates or footwork velocity metrics) that your models are starving for that I should add to the annotations while I'm doing the manual work?
You can check out the full dataset description card and leave feedback or join the beta waitlist directly on Hugging Face here: https://huggingface.co/datasets/benito87/longsword-spatial-physics-100
I want to make sure this is actually useful, so any brutal feedback on the structure or parameters is highly appreciated.
[link] [comments]
Want to read more?
Check out the full article on the original site