1 min readfrom Machine Learning

Why Is Table Extraction with VLM Models Still Challenging? [D]

Why Is Table Extraction with VLM Models Still Challenging? [D]
Why Is Table Extraction with VLM Models Still Challenging? [D]

Hey everyone, I’m struggling to find a good approach for converting PDFs to Markdown (especially for financial data). The main challenge is handling borderless tables and tables with more than 5–6 columns. I’ve tried docling, graphite-docling, marker, etc., but haven’t found a solid open-source solution. The only thing that works well so far is LandingAI (but it’s paid).

Does anyone know of a good open-source alternative? TIA!

Sample:

https://preview.redd.it/tajjcvjt5jyg1.png?width=959&format=png&auto=webp&s=8d04c5e946ab361bfef08021f79d106ab62a07cd

https://preview.redd.it/lhpwnbty5jyg1.png?width=630&format=png&auto=webp&s=8dc0475a32b89ce7f8107f3940fd3eb6b0896a3a

submitted by /u/No_Stretch_5809
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#financial modeling with spreadsheets
#generative AI for data analysis
#Excel alternatives for data analysis
#natural language processing for spreadsheets
#rows.com
#big data management in spreadsheets
#conversational data analysis
#financial modeling
#real-time data collaboration
#intelligent data visualization
#data visualization tools
#enterprise data management
#big data performance
#data analysis tools
#data cleaning solutions
#table extraction
#VLM models
#PDFs to Markdown
#financial data
#borderless tables